Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Characterizing Dimensionality Reduction Algorithm Performance in terms of Data Set Aspects

Abstract Details

2017, Bachelor of Science (BS), Ohio University, Computer Science.
Dimensionality reduction algorithms are used in every field that uses data, and for purposes ranging from facilitating data visualization to reducing the amount of data that must be considered in analysis. Research has established that there is no definitive best algorithm—any algorithm can be the front runner depending on what dataset is used. Despite this, relatively little research has been conducted in looking for dataset aspects that can predict algorithm performance. This research aims to serve as a foundational work for answering this question. Three dataset aspects (number of dimensions, continuity of dimensions, multivariate normality) were selected as potential factors that can affect performance based on how most algorithms vary in approach. Data sets were selected or created to cover a spread of the first two aspects (small, medium, and large number of dimensions, and binary, n-ary, continuous, and mixed continuity) and then tested to see if they exhibit multivariate normality. These data sets were then reduced using 1 recent and 4 well known dimensionality reduction algorithms: SMA, PCA, mRMR, kPCA, and nlPCA. This reduced data was used as input in a range of classification and clustering algorithms, and the performance of these algorithms was measured and compared. It was found that, under the tested datasets, neither continuity nor dimensionality served as predictors for algorithm performance; however, results point to another, previously unexplored way of characterizing data sets that could be a significant predictor of performance.
Ronaldo Vigo (Advisor)
40 p.

Recommended Citations

Citations

  • Sulecki, N. (2017). Characterizing Dimensionality Reduction Algorithm Performance in terms of Data Set Aspects [Undergraduate thesis, Ohio University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ouhonors1493397823307462

    APA Style (7th edition)

  • Sulecki, Nathan. Characterizing Dimensionality Reduction Algorithm Performance in terms of Data Set Aspects. 2017. Ohio University, Undergraduate thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ouhonors1493397823307462.

    MLA Style (8th edition)

  • Sulecki, Nathan. "Characterizing Dimensionality Reduction Algorithm Performance in terms of Data Set Aspects." Undergraduate thesis, Ohio University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ouhonors1493397823307462

    Chicago Manual of Style (17th edition)