Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Thesis.pdf (714.07 KB)
ETD Abstract Container
Abstract Header
Characterizing Dimensionality Reduction Algorithm Performance in terms of Data Set Aspects
Author Info
Sulecki, Nathan
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ouhonors1493397823307462
Abstract Details
Year and Degree
2017, Bachelor of Science (BS), Ohio University, Computer Science.
Abstract
Dimensionality reduction algorithms are used in every field that uses data, and for purposes ranging from facilitating data visualization to reducing the amount of data that must be considered in analysis. Research has established that there is no definitive best algorithm—any algorithm can be the front runner depending on what dataset is used. Despite this, relatively little research has been conducted in looking for dataset aspects that can predict algorithm performance. This research aims to serve as a foundational work for answering this question. Three dataset aspects (number of dimensions, continuity of dimensions, multivariate normality) were selected as potential factors that can affect performance based on how most algorithms vary in approach. Data sets were selected or created to cover a spread of the first two aspects (small, medium, and large number of dimensions, and binary, n-ary, continuous, and mixed continuity) and then tested to see if they exhibit multivariate normality. These data sets were then reduced using 1 recent and 4 well known dimensionality reduction algorithms: SMA, PCA, mRMR, kPCA, and nlPCA. This reduced data was used as input in a range of classification and clustering algorithms, and the performance of these algorithms was measured and compared. It was found that, under the tested datasets, neither continuity nor dimensionality served as predictors for algorithm performance; however, results point to another, previously unexplored way of characterizing data sets that could be a significant predictor of performance.
Committee
Ronaldo Vigo (Advisor)
Pages
40 p.
Subject Headings
Computer Science
Keywords
data science
;
dimensionality reduction
;
computer science
;
machine learning
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Sulecki, N. (2017).
Characterizing Dimensionality Reduction Algorithm Performance in terms of Data Set Aspects
[Undergraduate thesis, Ohio University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ouhonors1493397823307462
APA Style (7th edition)
Sulecki, Nathan.
Characterizing Dimensionality Reduction Algorithm Performance in terms of Data Set Aspects.
2017. Ohio University, Undergraduate thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ouhonors1493397823307462.
MLA Style (8th edition)
Sulecki, Nathan. "Characterizing Dimensionality Reduction Algorithm Performance in terms of Data Set Aspects." Undergraduate thesis, Ohio University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ouhonors1493397823307462
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ouhonors1493397823307462
Download Count:
427
Copyright Info
© 2017, all rights reserved.
This open access ETD is published by Ohio University Honors Tutorial College and OhioLINK.