Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Cunningham_James_MS_Thesis.pdf (14.12 MB)
ETD Abstract Container
Abstract Header
Efficient, Parameter-Free Online Clustering
Author Info
Cunningham, James
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1606762403895603
Abstract Details
Year and Degree
2020, Master of Science, Ohio State University, Computer Science and Engineering.
Abstract
As the number of data sources and dynamic datasets grows so does the need for efficient online algorithms to process them. Clustering is a very important data exploration tool used in various fields from data science to machine learning. Clustering finds structure unsupervised among indecipherable collections of data. Online clustering is the process of grouping samples in a data stream into meaningful collections as they appear over time. As more data is collected in these streams online clustering becomes more important than ever. Unfortunately, the number of available efficient online clustering algorithms is limited due to the difficulty of their design that often requires significant trade-offs in clustering performance for efficiency. Those that do exist require expert domain knowledge of the data space to set hyperparameters for the algorithm such as the desired number of clusters. This domain knowledge is often unavailable, so resources must be spent tuning hyperparameters to get acceptable performance. In this thesis we present an online modification to FINCH, the recent state-of-the-art parameter-free clustering algorithm by Sarfraz et al. called Stream FINCH (S-FINCH). We examine the stages of the FINCH algorithm and leverage key insights to produce an algorithm which reduces the online update complexity of FINCH. We then compare the performance of S-FINCH and FINCH over several toy and real-world datasets. We show theoretically and empirically that our S-FINCH algorithm is more efficient than the FINCH algorithm in the online domain and has reasonable real-time update performance. We also present several alternative cluster representatives which can be used to build different agglomerative cluster hierarchies using the S-FINCH algorithm. We compare the cluster quality and clustering time performance of these new representatives with the original FINCH algorithm. The S-FINCH algorithm presented in this thesis allows for fast and efficient online clustering of arbitrary datasets while requiring no user-defined hyperparameters.
Committee
James Davis, PhD. (Advisor)
Juan Vasquez, PhD. (Committee Member)
Kyle Tarplee, PhD. (Committee Member)
Wei-Lun Chao, PhD. (Committee Member)
Pages
103 p.
Subject Headings
Computer Science
;
Information Science
;
Information Technology
Keywords
clustering
;
parameter-free
;
hierarchical clustering
;
unsupervised
;
data exploration
;
online clustering
;
efficient
;
streaming
;
hierarchical algorithms
;
online algorithms
;
graph partitioning
;
partitioning algorithms
;
optimization
;
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Cunningham, J. (2020).
Efficient, Parameter-Free Online Clustering
[Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1606762403895603
APA Style (7th edition)
Cunningham, James.
Efficient, Parameter-Free Online Clustering.
2020. Ohio State University, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1606762403895603.
MLA Style (8th edition)
Cunningham, James. "Efficient, Parameter-Free Online Clustering." Master's thesis, Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1606762403895603
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1606762403895603
Download Count:
210
Copyright Info
© 2020, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.