Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
47789.pdf (9.69 MB)
ETD Abstract Container
Abstract Header
Approximate N-Clustering on Heterogeneous Information Networks with Star Schema
Author Info
Madhamsetty, Lakshmi Poojitha
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1703170985010961
Abstract Details
Year and Degree
2023, MS, University of Cincinnati, Engineering and Applied Science: Computer Science.
Abstract
Clustering techniques are becoming a growing need in today’s world where data is being accumulated on a large scale. Given a set of objects, clustering helps in dividing these objects into groups called clusters, where the objects in one cluster exhibit similarities while objects from different clusters are dissimilar. Clustering analysis is essential in data mining to find underlying patterns and information. Many complex systems in the real world are formed by multiple data type objects and interactions between them and such systems can be modeled as Heterogeneous Information Networks (HINs). A heterogeneous information network (HIN) is a network that consists of nodes of different object types and links representing relations between the nodes. Cluster analysis of heterogeneous information networks helps in revealing the underlying information between these complex systems. Most real-world applications that handle big data including social networks, medical information systems, online e-commerce systems, and most movie database systems (such as IMDB, Netflix, etc.,) can be structured into heterogeneous information networks. Therefore, effective clustering analysis of large-scale heterogeneous information networks poses an interesting challenge. In this research, we have developed an ‘approximate N-Clustering’ model, which is based on the A* (pronounced as A-star) search algorithm, and Chernoff Upper Bound is used as the approximation limiting criterion (the heuristic function). Here ‘N’ represents the number of databases/dimensions/object types. In our thesis, we have used a star distribution pattern (or star schema) for clustering on HINs. In a star network schema, there is one central object type and all other object types are connected to this central object type. The approximate n-clusters generated from our algorithm are the most informative occurrences (i.e., the probability of occurrence of any new n-cluster with higher priorities will not increase). Our algorithm can be particularly useful in the domains such as the medical domain where information and patterns between genes, diseases, mutations, chemicals, etc. can be mined and analyzed as a whole. Other areas where the relationship between different factors can be studied as an entity like streaming networks where user age, location, gender, and preference for a genre can be useful to suggest a movie or a series.
Committee
Raj Bhatnagar, Ph.D. (Committee Chair)
Chong Yu, Ph.D. (Committee Member)
Vikram Ravindra, Ph.D. (Committee Member)
Pages
155 p.
Subject Headings
Computer Science
Keywords
Approximate N-Clustering
;
Chernoff Bounds
;
Heuristics
;
Clustering
;
A-Star Search
;
Scalable N-Clustering
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Madhamsetty, L. P. (2023).
Approximate N-Clustering on Heterogeneous Information Networks with Star Schema
[Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1703170985010961
APA Style (7th edition)
Madhamsetty, Lakshmi Poojitha.
Approximate N-Clustering on Heterogeneous Information Networks with Star Schema.
2023. University of Cincinnati, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1703170985010961.
MLA Style (8th edition)
Madhamsetty, Lakshmi Poojitha. "Approximate N-Clustering on Heterogeneous Information Networks with Star Schema." Master's thesis, University of Cincinnati, 2023. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1703170985010961
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ucin1703170985010961
Download Count:
90
Copyright Info
© 2023, some rights reserved.
Approximate N-Clustering on Heterogeneous Information Networks with Star Schema by Lakshmi Poojitha Madhamsetty is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by University of Cincinnati and OhioLINK.