Doctor of Philosophy, The Ohio State University, 2017, Computer Science and Engineering
Topological data analysis (TDA) provides a new methodology to data analysis problems. It captures intrinsic topological structures in data, which can then offer useful guidelines for other data analysis approaches. One main task in TDA is to extract topological features from data. And this is often done through one of the main tools of TDA, persistent homology, or simply, persistence. While the idea of persistent homology has been demonstrated valuable in a broad range of areas, its application to large scale real-world data is limited by the efficiency of existing persistence computation approaches.
I will present a focused study during my PhD research on broadening applicability of the idea of persistence in data analysis in two fronts, to explore novel ways of applying persistent homology for qualitative data analysis and to study the computational issues related to persistent homology, so as to significantly broaden their applications to large-scale low or high dimensional data analysis.
In the first direction, we applied persistent homology to a special kind of data, called \emph{metric graphs}. A metric graph offers one of the simplest yet still meaningful ways to represent the non-linear structure hidden behind the data. Thus, comparing two graphs has many important applications in data analysis. In this work, we proposed a new distance between two finite metric graphs, called the \spdist{} distance, which formulates the graph matching problem through persistent homology. Our \spdist{} distance has two properties not shared by previous methods: First, it is stable against the perturbations of the input graph metrics. Second, it is a \emph{continuous} distance measure, in the sense that it is defined on an alignment of the underlying spaces of input graphs, instead of merely their nodes. This makes our \spdist{} distance robust against, for example, different discretizations of the same underlying graph. Despite considering the input graphs as continuous (open full item for complete abstract)
Committee: Tamal Dey Dr. (Advisor); Wang Yusu Dr. (Advisor); Wenger Rephael Dr. (Committee Member); Bozanic Zahn Dr. (Committee Member)
Subjects: Computer Science