Doctor of Philosophy, The Ohio State University, 2020, Computer Science and Engineering
The growing need to understand and process data has driven innovation in many disparate areas of data science. The computational biology, graphics, and machine learning communities, among others, are striving to develop robust and efficient methods for such analysis. In this work, we demonstrate the utility of topological data analysis (TDA), a new and powerful tool to understand the shape and structure of data, to these diverse areas.
First, we develop a new way to use persistent homology, a core tool in topological data analysis, to extract machine learning features for image classification. Our work focuses on improving modern image classification techniques by considering topological features. We show that incorporating this information to supervised learning models allows our models to improve classification, thus providing evidence that topological signatures can be leveraged for enhancing some of the pioneering applications in computer vision.
Next, we propose a topology based, fast, scalable, and parameter-free technique to explore a related problem in protein analysis and classification. On an initial simplicial complex built using constituent protein atoms and bonds, simplicial collapse is used to construct a filtration which we use to compute persistent homology. This is ultimately our signature for the protein-molecules. Our method, besides being scalable, shows sizable time and memory improvements compared to similar topology-based approaches. We use the signature to train a protein domain classifier and compare state-of-the-art structure-based protein signatures to achieve a substantial improvement in accuracy.
Besides considering the intervals of persistent homology like our first two applications, some applications need to find representative cycles for them. These cycles, especially the minimal ones, are useful geometric features functioning as augmentations for the intervals in a purely topological barcode. We address the problem of computing th (open full item for complete abstract)
Committee: Tamal Dey (Advisor); Yusu Wang (Committee Member); Raphael Wenger (Committee Member)
Subjects: Computer Engineering; Computer Science