Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
dissertation.pdf (1.13 MB)
ETD Abstract Container
Abstract Header
Clustering Consistently
Author Info
Eldridge, Justin, Eldridge
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1512070374903249
Abstract Details
Year and Degree
2017, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Abstract
Clustering is the task of organizing data into natural groups, or clusters. A central goal in developing a theory of clustering is the derivation of correctness guarantees which ensure that clustering methods produce the right results. In this dissertation, we analyze the setting in which the data are sampled from some underlying probability distribution. In this case, an algorithm is "correct" (or consistent) if, given larger and larger data sets, its output converges in some sense to the ideal cluster structure of the distribution. In the first part, we study the setting in which data are drawn from a probability density supported on a subset of a Euclidean space. The natural cluster structure of the density is captured by the so-called high density cluster tree, which is due to Hartigan (1981). Hartigan introduced a notion of convergence to the density cluster tree, and recent work by Chaudhuri and Dasgupta (2010) and Kpotufe and von Luxburg (2011) has contructed algorithms which are consistent in this sense. We will show that Hartigan's notion of consistency is in fact not strong enough to ensure that an algorithm recovers the density cluster tree as we would intuitively expect. We identify the precise deficiency which allows this, and introduce a new, stronger notion of convergence which we call consistency in merge distortion. Consistency in merge distortion implies Hartigan's consistency, and we prove that the algorithm of Chaudhuri and Dasgupta (2010) satisfies our new notion. In the sequel, we consider the clustering of graphs sampled from a very general, non-parametric random graph model called a graphon. Unlike in the density setting, clustering in the graphon model is not well-studied. We therefore rigorously analyze the cluster structure of a graphon and formally define the graphon cluster tree. We adapt our notion of consistency in merge distortion to the graphon setting and identify efficient, consistent algorithms.
Committee
Mikhail Belkin, PhD (Advisor)
Yusu Wang, PhD (Advisor)
Facundo Mémoli, PhD (Committee Member)
Vincent Vu, PhD (Committee Member)
Pages
131 p.
Subject Headings
Artificial Intelligence
;
Computer Science
;
Statistics
Keywords
machine learning
;
unsupervised learning
;
statistical learning
;
clustering
;
graphon
;
mergeon
;
density cluster tree
;
hierarchical clustering
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Eldridge, Eldridge, J. (2017).
Clustering Consistently
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1512070374903249
APA Style (7th edition)
Eldridge, Eldridge, Justin.
Clustering Consistently.
2017. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1512070374903249.
MLA Style (8th edition)
Eldridge, Eldridge, Justin. "Clustering Consistently." Doctoral dissertation, Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1512070374903249
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1512070374903249
Download Count:
349
Copyright Info
© 2017, some rights reserved.
Clustering Consistently by Justin Eldridge Eldridge is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by The Ohio State University and OhioLINK.