Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 21)

Mini-Tools

 
 

Search Report

  • 1. Brown, Kyle Topological Hierarchies and Decomposition: From Clustering to Persistence

    Doctor of Philosophy (PhD), Wright State University, 2022, Computer Science and Engineering PhD

    Hierarchical clustering is a class of algorithms commonly used in exploratory data analysis (EDA) and supervised learning. However, they suffer from some drawbacks, including the difficulty of interpreting the resulting dendrogram, arbitrariness in the choice of cut to obtain a flat clustering, and the lack of an obvious way of comparing individual clusters. In this dissertation, we develop the notion of a topological hierarchy on recursively-defined subsets of a metric space. We look to the field of topological data analysis (TDA) for the mathematical background to associate topological structures such as simplicial complexes and maps of covers to clusters in a hierarchy. Our main results include the definition of a novel hierarchical algorithm for constructing a topological hierarchy, and an implementation of the MAPPER algorithm and our topological hierarchies in pure Python code as well as a web app dashboard for exploratory data analysis. We show that the algorithm scales well to high-dimensional data due to the use of dimensionality reduction in most TDA methods, and analyze the worst-case time complexity of MAPPER and our hierarchical decomposition algorithm. Finally, we give a use case for exploratory data analysis with our techniques.

    Committee: Derek Doran Ph.D. (Advisor); Michael Raymer Ph.D. (Committee Member); Vincent Schmidt Ph.D. (Committee Member); Nikolaos Bourbakis Ph.D. (Committee Member); Thomas Wischgoll Ph.D. (Committee Member) Subjects: Computer Science
  • 2. Synakowski, Stuart Novel Instances and Applications of Shared Knowledge in Computer Vision and Machine Learning Systems

    Doctor of Philosophy, The Ohio State University, 2021, Electrical and Computer Engineering

    The fields of computer vision and machine learning have made enormous strides in developing models which solve tasks only humans have been capable of solving. However, the models constructed to solve these tasks came at an enormous price in terms of computational resources and data collection. Motivated by the sustainability of continually developing models from scratch to tackle every additional task humans can solve, researchers are interested in efficiently constructing new models for developing solutions to new tasks. The sub-fields of machine learning devoted to this line of research go by many names. Such names include multi-task learning, transfer learning, and few-shot learning. All of these frameworks use the same assumption that knowledge should be shared across models to solve a set of tasks. We define knowledge as the set of conditions used to construct a model that solves a given task. By shared knowledge, we are referring to conditions that are consistently used to construct a set of models which solve a set of tasks. In this work, we address two sets of tasks posed in the fields of computer vision and machine learning. While solving each of these sets of tasks, we show how each of our methods exhibits a novel implementation of shared knowledge leading to many implications for future work in developing systems that further emulate the abilities of human beings. The first set of tasks fall within the sub-field of action analysis, specifically the recognition of intent. Instead of a data-driven approach, we construct a hand-crafted model to infer between intentional/non-intentional movement using common knowledge concepts known by humans. These knowledge concepts are ultimately used to construct an unsupervised method to infer between intentional and non-intentional movement across levels of abstraction. By layers of abstraction we mean that the model needed to solve the most abstract instances of intent recognition, is useful in developing models whi (open full item for complete abstract)

    Committee: Aleix Martinez (Advisor); Abhishek Gupta (Committee Member); Yingbin Liang (Committee Member) Subjects: Artificial Intelligence; Computer Engineering; Computer Science
  • 3. Mandal, Sayan Applications of Persistent Homology and Cycles

    Doctor of Philosophy, The Ohio State University, 2020, Computer Science and Engineering

    The growing need to understand and process data has driven innovation in many disparate areas of data science. The computational biology, graphics, and machine learning communities, among others, are striving to develop robust and efficient methods for such analysis. In this work, we demonstrate the utility of topological data analysis (TDA), a new and powerful tool to understand the shape and structure of data, to these diverse areas. First, we develop a new way to use persistent homology, a core tool in topological data analysis, to extract machine learning features for image classification. Our work focuses on improving modern image classification techniques by considering topological features. We show that incorporating this information to supervised learning models allows our models to improve classification, thus providing evidence that topological signatures can be leveraged for enhancing some of the pioneering applications in computer vision. Next, we propose a topology based, fast, scalable, and parameter-free technique to explore a related problem in protein analysis and classification. On an initial simplicial complex built using constituent protein atoms and bonds, simplicial collapse is used to construct a filtration which we use to compute persistent homology. This is ultimately our signature for the protein-molecules. Our method, besides being scalable, shows sizable time and memory improvements compared to similar topology-based approaches. We use the signature to train a protein domain classifier and compare state-of-the-art structure-based protein signatures to achieve a substantial improvement in accuracy. Besides considering the intervals of persistent homology like our first two applications, some applications need to find representative cycles for them. These cycles, especially the minimal ones, are useful geometric features functioning as augmentations for the intervals in a purely topological barcode. We address the problem of computing th (open full item for complete abstract)

    Committee: Tamal Dey (Advisor); Yusu Wang (Committee Member); Raphael Wenger (Committee Member) Subjects: Computer Engineering; Computer Science
  • 4. Singh, Rohit Optimizing Topological Spaces for Scalable Persistent Homology Computations

    PhD, University of Cincinnati, 2024, Engineering and Applied Science: Computer Science and Engineering

    Topological Data Analysis (TDA) explores the topological properties of data treated as a topological space. Persistent Homology (PH) is a component of TDA that computes the topological features (connected components, loops, voids, and their higher-dimensional counterparts) found in data. Unfortunately, computing PH in dimensional spaces above R7 is impractical due to the size and computational cost of generating the filtration (filtered complexes) of the data. While there are many types of complexes, they are typically classified into two broad categories, namely: (i) the purely combinatorial Abstract complexes, and (ii) Geometric complexes that have a realization/embedding in a given space. In general, Abstract complexes are relatively easy to generate but suffer from exponential memory growth, making them memory inefficient in higher dimensions and for big data. In contrast, Geometric complexes generally have a significantly more compact representation but their construction incurs significant computational costs that become impractical in higher dimensions. This dissertation works to address the computational and memory costs experienced in the construction of filtered complexes used by TDA algorithms. In particular, this study will examine two main approaches to build a filtered complex of data, namely: (i) the construction of approximate geometric sparsified simplicial complexes for high dimensional and big data, and (ii) the construction and use of approximate Polytopal Complexes where convex polytopes represent cells in the different dimensions of the data. The overall objective of this work is to investigate techniques for quickly constructing geometric complexes that are memory efficient and suitable for use with higher dimensional big data than currently possible. Although several techniques for the sparsification of simplicial complexes are well known, this work contributes a solution based on ß-skeletons (open full item for complete abstract)

    Committee: Philip Wilsey Ph.D. (Committee Chair); Vikram Ravindra Ph.D. (Committee Member); Bledar Konomi Ph.D. (Committee Member); Ali Minai Ph.D. (Committee Member); Badri Vellambi Ravisankar Ph.D. (Committee Member) Subjects: Computer Engineering
  • 5. Clause, Nathaniel New Invariants and Algorithms for Persistence over Posets

    Doctor of Philosophy, The Ohio State University, 2024, Mathematics

    Persistent homology is a central tool in topological data analysis that allows us to study the shape of data. In the one-parameter setting, there is a lossless, discrete representation of the persistent homology of an input space in the form of a persistence diagram, which consists of a multiset of points in the plane. Despite having nice theoretical properties and demonstrating practical usefulness in numerous domains, it is well-known that one-parameter persistence can be greatly disrupted by noise or outliers in the input data. A standard way to mitigate these issues is to move to multiparameter persistence, where we utilize an indexing poset for our persistence module that is not totally ordered. The downside of multiparameter persistence is that, in general, there is no lossless, discrete representation such as the persistence diagram in the one-parameter setting. In light of this fact, researchers studying multiparameter persistence aim to introduce discrete invariants for multiparameter persistence and study how much information they are able to retain, as well as the practicality of their computation. This dissertation follows this program of research. In particular, advancements towards this end in this dissertation include: (a) Extending the notion of generalized persistence diagrams by Kim and Mémoli to the setting where the indexing poset is not locally finite, and precisely characterizing the discriminating strength of the resulting invariant, (b) Introducing the meta-rank and meta-diagrams, new invariants for 2-parameter persistence, studying their theoretical properties, and introducing an algorithm for their computation, (c) Extending the notion of persistence sets by Gómez and Mémoli to the multiparameter setting, demonstrating stability for multiparameter persistence sets and showcasing their strength and computability in a shape classification experiment, (d) Providing implementations and experiments for tools utilizi (open full item for complete abstract)

    Committee: Facundo Mémoli (Advisor); Sanjeevi Krishnan (Committee Member); Matthew Kahle (Committee Member) Subjects: Mathematics
  • 6. Yadav, Anurag Memory Efficient Construction of Delaunay Triangulation in High Dimensions

    MS, University of Cincinnati, 2024, Engineering and Applied Science: Computer Engineering

    Topological Data Analysis(TDA) is a sub-field of data mining that studies topological invariants present in data. Generally, TDA methods treat discrete data as sampling of manifold to form filtration of data through series of sub-complexes. These sub-complexes are constructed through an increasing set of connectivity distances (ranging from disconnected, to fully connected). Examining the filtration, TDA methods works to identify invariant topological features for the dataset. In computational TDA, two most common tools are Persistent Homology(PH) and Euler Characteristic Curves(ECC). PH encapsulates the study of features that remain invariant to the continuous deformation of an object and is built upon sub-complexes that represent a continuous span over the input dataset. ECC plots the sum of Betti numbers found at each sub-complex of filtration as a curve. While there are several types of complexes that can be formed from data, Vietoris-Rips(VR), and Alpha complex are most prevalent. VR complexes are formed by combinatorial methods. This results in fast construction; however, their space complexity is large and sensitive to dimension. In contrast, Alpha complex is substantially smaller than the corresponding VR complex, but it requires computation of Delaunay Triangulation(DT) in data dimension. Unfortunately, computing DT requires a substantial amount of runtime to complete. Since much of the work with TDA tools is done in low dimensional data R3, a VR complex is used in the construction of filtration. However, when data resides in higher dimensions, VR complex become impractical and alternate complexes must be pursued. Alpha complex for instance can be formed for dimensions R3-R9$ while using conventional construction methods for underlying DT; however in higher dimensions these construction methods fail. This thesis introduces a new algorithm for computing Delaunay Triangulation, called Helix Algorithm, that provides ability to process higher dim (open full item for complete abstract)

    Committee: Philip Wilsey Ph.D. (Committee Chair); Badri Vellambi Ravisankar Ph.D. (Committee Member); Vikram Ravindra Ph.D. (Committee Member) Subjects: Computer Engineering
  • 7. Malott, Nicholas Euler TDA for PH Approximation

    PhD, University of Cincinnati, 2023, Engineering and Applied Science: Computer Science and Engineering

    Topological Data Analysis (TDA) is a field of data mining that captures the algebraic structure of data by employing topological invariants. Topological invariants are measurable properties of a space such as the cardinality, connectedness, countability, or homology. These characteristics can identify data relationships not found through traditional data analysis methods, providing a unique perspective into the underlying structural representations. Persistent Homology (PH) stands out as a key tool of TDA, as it examines the emergence and collapse of homology classes persisting over a topological filtration. PH yields a set of persistence intervals representing crucial features in the data, including connected components, loops, voids, and higher-dimensional structures that are irreducible through algebraic reduction. Persistent homology has proven effectiveness in data study of various domains, including network analysis, bioinformatics, image recognition, and object classification. However, the memory complexity of PH inhibits study of higher-order homology features, leading to bounded analysis in practice. Applications employing PH have been limited to identifying connected components, loops, and voids, which correspond to the first three dimensions of homology classes. Higher-order homology classes exhibit exponential space complexities relative to both the size and dimension of the data, posing a significant limitation in their study. The Euler Characteristic (EC) detects the general presence of homology classes in a topological space. When the EC is computed over a topological filtration it captures the alternating sum of homology classes, related to the results of PH. This metric, known as the Euler Characteristic Curve (ECC), has found application in several prominent instances of TDA. While the ECC does not identify the distinct individual topological features, it is more memory-efficient than PH and can characterize specific filtration ranges (open full item for complete abstract)

    Committee: Philip Wilsey Ph.D. (Committee Chair); Gowtham Atluri Ph.D. (Committee Member); Badri Vellambi Ravisankar Ph.D. (Committee Member); Wen-Ben Jone Ph.D. (Committee Member); Alex Dempsey Ph.D. (Committee Member) Subjects: Computer Engineering
  • 8. Gomez Flores, Mario Curvature Sets and Persistent Homology

    Doctor of Philosophy, The Ohio State University, 2023, Mathematics

    Given a metric space $(X,d_X)$, the $n$-th curvature set is the set of $n$-by-$n$ distance matrices generated by a sample from $X$ with $n$ or less points. Similarly, the $(n,k)$ persistence set of $X$ is the set of $k$-dimensional persistence diagrams of all $n$-point samples from $X$. This dissertation has two major parts, each dedicated to persistence sets or curvature sets. A major obstacle that hampers the widespread use of topological techniques in data science is the sometimes prohibitive computational cost of persistent homology. Persistence sets aim to circumvent this limitation while retaining useful geometric and topological information from the input space. We study the experimental and theoretical properties of persistence sets, and compare them with the standard VR-persistent homology from the perspectives of computational efficiency and discriminative power, including in a practical shape classification task. We characterize several persistence sets of the circle, higher-dimensional spheres, surfaces with constant curvature, and a specific family of metric graphs, and show spaces that have different persistence sets but are indistinguishable by persistent homology. All in all, we believe that persistence sets can aid in data science tasks where the shape is important but the standard persistent homology algorithms are impractical. In the second part, we study the curvature sets $\Kn_n(\Sp^1)$ of the circle $\Sp^1$ as a topological space. We give several characterizations of $\Kn_n(\Sp^1)$ as quotients of tori under group actions, and use them to compute the homology of $\Kn_n(\Sp^1)$ with Mayer-Vietoris arguments. We construct an abstract simplicial complex $\St_n(\Sp^1)$ whose geometric realization is $\Kn_n(\Sp^1)$. Moreover, we give an embedding of $\St_n(\Sp^1)$ in $\R^{n \times n}$ and show that $\Kn_n(\Sp^1)$ is the union of the convex hulls of the faces this embedding. Lastly, inspired by the characterization of a persistence set of su (open full item for complete abstract)

    Committee: Facundo Mémoli (Advisor); Matthew Kahle (Committee Member); David Anderson (Committee Member) Subjects: Mathematics
  • 9. Pathirana, Hasani An efficient framework for hypothesis testing using Topological Data Analysis

    Doctor of Philosophy (Ph.D.), Bowling Green State University, 2023, Statistics

    Topological data analysis (TDA) has become a popular approach in recent years to study the shape structure of data. Persistence homology is a widely used TDA tool that describes how topology changes through a nested sequence of topological spaces in the form of simplicial complexes. As a result, topological features (e.g., connected components, loops, and voids) appear and disappear, and the summary of this evolution is reported as a persistence diagram (PD). Considered as topological signatures of the data, the space of PDs can be endowed with Wasserstein distance with a stability property. However, since PDs are not vectors in Euclidean space and calculating Wasserstein distances might get computationally costly, they have limitations to be represented as inputs in machine learning tasks. A common remedy to deal with this issue is to map PDs into a space of functions and to vectorized by evaluating them over a grid of scale values which results in vector summaries belonging to Euclidean space. The Betti function, which has incorporated weights, is one of the simplest functional summaries for PDs leading to such vector representations. Even though it is the easiest to construct and fast to implement, no stability result is proven to the best of our knowledge. In the present work, we introduce a new method to vectorize the Betti function by integrating it between two consecutive scale values of a grid. The resulting vector summary, named a vector of averaged Bettis (VAB), provides a lower-dimensional, informative, and computationally efficient vector representation compared to the standard method of vectorizing the Betti function. We also prove a stability result for a class of weight functions with respect to Wasserstein distance. Further, through several experimental studies, we show that the permutation-based hypothesis testing procedure, which aims at identifying whether two sets PDs are drawn from the same distribution or process, can be improved i (open full item for complete abstract)

    Committee: Umar Islambekov Ph.D. (Committee Chair); Paul Morris Ph.D. (Other); Kit Chan Ph.D. (Committee Member); Maria Rizzo Ph.D. (Committee Member) Subjects: Statistics
  • 10. Sens, Aaron Topology Preserving Data Reductions for Computing Persistent Homology

    MS, University of Cincinnati, 2021, Engineering and Applied Science: Computer Science

    Topological Data Analysis (TDA) is a relatively recent development in the field of data mining that extends algebraic topology into the realm of computational data mining. TDA has unique advantages compared to other data mining techniques, namely that it is noise resistant and that it generates unique identifiers in the form of persistence intervals for any d-dimensional numerical data set where d>1. These data sets are commonly referred to as point clouds. Persistent homology is a one of the most widely used methods of TDA. Persistent homology is a computational process that creates a multi-dimensional topological representation of an input point cloud. Computing persistent homology is exponential in time and memory relative to the size of the input point cloud. TDA with persistent homology's primary drawback from being widely adopted as a data mining technique is its constraints on point cloud size and dimension. Due to the exponential nature of computing persistent homology, state of the art tools are currently limited to a few thousand points in 3 dimensions and smaller point clouds in higher dimensions (unless the computation is somehow limited to lower dimensional homologies and bounded connectivity distances). This thesis explores techniques to sample larger point clouds to enable the computation of persistent homology. Several techniques to sample large point clouds have been explored. Studies to sample using random or heuristic methods are well established. More recently, the k-means++ clustering algorithm has been used to create reduced mappings (samplings) of the data. These approaches use the k-means++ algorithm to locate a collection of nano-clusters with their centroids being used/extracted to represent the data by a smaller set of points. The number of nano-clusters requested is determined to be of a size such that persistent homology can be computed. While it is known that k-means++ clustering can be used in this way and (open full item for complete abstract)

    Committee: Philip Wilsey Ph.D. (Committee Chair); Gowtham Atluri Ph.D. (Committee Member); Wen-Ben Jone Ph.D. (Committee Member) Subjects: Computer Science
  • 11. Malott, Nicholas Partitioned Persistent Homology

    MS, University of Cincinnati, 2020, Engineering and Applied Science: Computer Engineering

    In an era of increasing data size and complexity novel algorithms to mine embedded relationships within big data are being studied throughout high performance computing. One such approach, Topological Data Analysis (TDA), studies homologies in a point cloud. Identified homologies represent topological features --- loops, holes, and voids --- independent of continuous deformation and resilient to noise. These topological features identify a generalized structure of input data. TDA has shown promise in several applications including studying various networks, digital images and movies, protein analysis and classification, and genomic sequences. TDA provides an alternative approach to classification for general data mining applications. The principle tool of TDA that examines the topological structures in a point cloud is Persistent Homology. Persistent homology identifies the underlying homologies of the point cloud as the connectedness of the point cloud increases, or persists. The output of persistent homology can be used to classify embedded structures in high dimensional spaces, enable automated classification from connected component analysis, identify abnormalities in data streams, classify structure of protein chains, along with many other science and biological applications. Although the uses are vast, persistent homology suffers from exponential growth in both run-time and memory complexity leading to limitations for evaluating large datasets. Persistent homology cannot be directly applied to large point clouds; modern tools are limited to the analysis of a few thousand points in R3. Optimization of data structures and enhancements to the algorithm have led to significant performance improvements, but the approaches still suffer in relation to the size of the input point cloud. One approach studied over recent years is to reduce the input point cloud, either in size or dimensions. These reductions gain obvious improvemen (open full item for complete abstract)

    Committee: Philip Wilsey Ph.D. (Committee Chair); Gowtham Atluri Ph.D. (Committee Member); Wen-Ben Jone Ph.D. (Committee Member) Subjects: Computer Engineering
  • 12. Moitra, Anindya Computation and Application of Persistent Homology on Streaming Data

    PhD, University of Cincinnati, 2020, Engineering and Applied Science: Computer Science and Engineering

    Persistent homology is a robust and powerful tool for data analysis that provides information about topological properties of data. Given a set of data points, persistent homology computes a set of intervals or lifespans of certain topological structures that appear and subsequently disappear through a sequence of nested spaces constructed on the data points. The topological structures and their lifespans enable the segregation of meaningful patterns from noise and often lead to the discovery of insight not discernible by conventional methods of data mining. The capabilities of persistent homology come at the cost of its high space complexity that grows exponentially with the number of input data points. This dissertation examines the application of persistent homology to streaming data, two important but disjoint areas of data science that never crossed paths before. The intensive computational requirements of persistent homology coupled with the unique challenges of dealing with a potentially infinite sequence of data objects in a stream are the primary reasons why persistent homology has not yet been applied to data stream mining. The dissertation proposes two general-purpose frameworks or models, called the microcluster model and the sliding-window model, for computing persistent homology on streaming data. Consistent with the standard computational paradigm for processing data streams, each of the models is organized into online and offline components. The online component maintains a summary of the data that preserves the topological structure of the stream. The offline component computes the persistence intervals from the data captured by the summary. The internal difference between the two models lies in the data structure used to maintain the summary of the stream during the online component. While the microcluster model employs statistics related to the weighted sum of the data vectors, the sliding-window model uses a topological structure that compr (open full item for complete abstract)

    Committee: Philip Wilsey Ph.D. (Committee Chair); Gowtham Atluri Ph.D. (Committee Member); Raj Bhatnagar Ph.D. (Committee Member); Brian Gettelfinger Ph.D. (Committee Member); Badri Vellambi Ravisankar Ph.D. (Committee Member) Subjects: Computer Science
  • 13. Holmes, Wesley Topological Analysis of Averaged Sentence Embeddings

    Master of Science (MS), Wright State University, 2020, Computer Science

    Sentence embeddings are frequently generated by using complex, pretrained models that were trained on a very general corpus of data. This thesis explores a potential alternative method for generating high-quality sentence embeddings for highly specialized corpora in an efficient manner. A framework for visualizing and analyzing sentence embeddings is developed to help assess the quality of sentence embeddings for a highly specialized corpus of documents related to the 2019 coronavirus epidemic. A Topological Data Analysis (TDA) technique is explored as an alternative method for grouping embeddings for document clustering and topic modeling tasks and is compared to a simple clustering method for effectiveness. The sentence embeddings generated are found to be effective for use in similarity based tasks and group in useful ways when used with the TDA based techniques explored as alternatives to traditional clustering-based approaches.

    Committee: Michael Raymer Ph.D. (Advisor); Mateen Rizki Ph.D. (Committee Member); Krishnaprasad Thirunarayan Ph.D. (Committee Member) Subjects: Computer Science
  • 14. Tian, Minghao Unravel the Geometry and Topology behind Noisy Networks

    Doctor of Philosophy, The Ohio State University, 2020, Computer Science and Engineering

    Graphs and network data are ubiquitous across a wide spectrum of scientific and application domains. Often in practice, an input graph can be considered as an observed snapshot of a (potentially continuous) hidden domain or process. Subsequent analysis, processing, and inferences are then performed on this observed graph. In this dissertation, we advocate the perspective that an observed graph is often a noisy version of some discretized 1-skeleton of a hidden domain, and specifically we propose the following network model $\widehat{G}_{\mathcal{X}}$ (called ER-perturbed random geometric graphs or noisy random geometric graphs): we assume that there is a true graph $G^*_{\mathcal{X}}$ which is a certain proximity graph for points sampled from a hidden domain $\mathcal{X}$; while the observed graph $\widehat{G}_{\mathcal{X}}$ is an Erd\H{o}s--R\'enyi type perturbed version of $G^*_{\mathcal{X}}$. We aim to analyze two aspects of this type of model --- geometry and topology. The geometric problem we aim to solve is to recover the metric structure of the hidden domain $\mathcal{X}$ from the observed graph $\widehat{G}_{\mathcal{X}}$, which is orthogonal to the usual studies of network models (which often focuses on characterizing / predicting behaviors and properties of real-world networks). Two methods are proposed in this dissertation to recover the metric structure of $\mathcal{X}$ from $\widehat{G}_{\mathcal{X}}$: Jaccard-filtering process based on the Jaccard (similarity) index, and edge-clique-filtering process based on the edge clique number (a local version of clique number). We rigorously prove that by using these two simple filtering processes, with high probability, one can recover the shortest-path metric of $G^*_{\mathcal{X}}$ (which reflects the metric of $\mathcal{X}$) within a constant multiplicative factor. The topological problem we study is to estimate the clique number of noisy random geometric graphs $\widehat{G}_{\mathcal{X}}$. Clique numbe (open full item for complete abstract)

    Committee: Yusu Wang (Advisor); Srinivasan Parthasarathy (Advisor); Pooya Hatami (Committee Member); David Sivakoff (Committee Member) Subjects: Computer Science
  • 15. Li, Tianqi Efficient Algorithms to Compute Topological Entities

    Doctor of Philosophy, The Ohio State University, 2021, Computer Science and Engineering

    Topological Data Analysis is a rapidly growing field to extract and analyze useful topological features from especially difficult datasets. Topological features, with their wide application, provide us a richer and more robust view of shapes compared to geometric features. During recent decades, the problem of computing the topological entities of a space has drawn much attention. In this thesis, I present my focused research on two topological entities: minimal homology basis of simplicial complexes and persistent homology of directed graphs. In the first part of this thesis, we focus on efficient computation of shortest cycles which form a homology basis under $\mathbb{Z}_2$-additions in a given simplicial complex $\mathcal{K}$. This problem has been researched actively in recent years. The complexity of these algorithms depends on the rank $g$ of the one-dimensional homology group of $\mathcal{K}$. This rank $g$ has a lower bound of $\Theta(n)$, where $n$ denotes the number of simplices in $\mathcal{K}$. When $g=\Theta(n)$, the current best algorithm, as far as we know, gives an $O(n^{\omega+1})$ worst-case time complexity where $\omega<2.373$ is the fast matrix multiplication exponent. In this work, we improved this time complexity. Combining the idea of annotations from \cite{annotation} and the divide and conquer technique from \cite{DivideConquer}, we present an algorithm that runs in $O(n^\omega+n^2g)$ time giving the first $O(n^3)$ worst-case algorithm for general complexes. If instead of minimal basis, we settle for approximate basis, we can improve the running time even further. We show that a $2$-approximation to the minimal basis can be computed in $O(n^{\omega}\sqrt{n\log n})$ worst-case time. We also study more general measures for defining the minimal basis and identify reasonable conditions on these measures that allow computing a minimal basis efficiently. We extend our approach to higher dimensional cycles where we improve the running time fo (open full item for complete abstract)

    Committee: Rephael Wenger (Advisor); Yusu Wang (Committee Member); Tamal Dey (Committee Member); Pooya Hatami (Committee Member) Subjects: Computer Science
  • 16. Larson, Michael A Progressive Refinement of Postural Human Balance Models Based on Experimental Data Using Topological Data Analysis

    Master of Science, Miami University, 2020, Mechanical and Manufacturing Engineering

    Injuries related to falling occur with high frequency and severity in geriatric individuals as well as those medically impaired by neuromuscular diseases (such as Parkinson's, multiple sclerosis, concussions, etc.). It has been shown that modeling a human's postural standing position as an inverted pendulum with a continuous feedback loop and analyzing an individual's center of pressure location during quiet standing can provide insight into stability states through topological data analysis. This stability state then offers information on the healthiness of an individual signal. However, only synthetically created signals derived from the model itself have undergone state classification. This thesis investigated different models of human balance by analyzing their similarities with experimental data using topological data analysis. The objective of this thesis was to competitively iterate through model types and variations to convincingly support a superior model type for use in future stability state classification research and studies.

    Committee: James Chagdes (Advisor); Ryan Kramer (Committee Member); Amit Shukla (Committee Member) Subjects: Kinesiology; Mechanical Engineering
  • 17. Morrison, Kevin Topological Data Analysis and Applications to Influenza

    Master of Science, Miami University, 2020, Mathematics

    This thesis is to provide an overview of Topological Data Analysis (TDA) for the advanced undergraduate or early graduate student. This thesis assumes familiarity with basic graph theory, point-set topology, and introductory algebraic topology. Furthermore, this thesis will study phylogenetic relationship of NA gene in various Influenza A strains and determine if horizontal evolution has occurred with the strains under consideration. Furthermore, this thesis serves as a verification to the results published in a paper in 2013, in which, TDA was used to study influenza A HA and NA.

    Committee: Daniel Farley PhD (Advisor); Paul Larson PhD (Committee Member); Ivonne Ortiz PhD (Committee Member) Subjects: Bioinformatics; Biology; Genetics; Mathematics; Virology
  • 18. Kim, Woojin The Persistent Topology of Dynamic Data

    Doctor of Philosophy, The Ohio State University, 2020, Mathematics

    We refine the theoretical foundations of Topological Data Analysis (TDA) for the multiscale analysis of dynamic topology, such as dynamic metric spaces or dynamic networks. Motivations include, but are not limited to, the characterization of flocking or swarming behavior of animals and social networks in the human sphere. In order to quantify the differences of such dynamics, we also generalize the Gromov-Hausdorff distance. We not only examine the resulting metric geometry, but also find practical algorithms for approximating those novel distances. To establish our results, we primarily exploit concepts from algebraic topology, metric geometry, combinatorics, and category theory, blending these with ideas of persistence in TDA. More specifically, the main achievements of this thesis include (a) The development of stable and informative invariants that encode spatiotemporal topological features of dynamic metric spaces and dynamic networks, (b) The establishment of a comparison framework for dynamic metric spaces or dynamic networks by extending the Gromov-Hausdorff distance or the Gromov-Wasserstein distance, (c) Generalization of the erosion distance by Patel for quantifying within polynomial time the difference between dynamic metric spaces or more generally multiparameter persistence modules, and (d) Extension of the notion of persistence diagram for summarizing persistence modules over posets, which often arise from dynamic data, from a standpoint of combinatorics and category theory.

    Committee: Facundo Mémoli (Advisor) Subjects: Mathematics
  • 19. Wang, Jiayuan Algorithms for Guaranteed Denoising of Data and Their Applications

    Doctor of Philosophy, The Ohio State University, 2020, Computer Science and Engineering

    Removing noise and recovering signals is a fundamental task in the area of data analysis. Noise is everywhere: In supervised learning, samples in the training set can be mislabeled. In road network reconstruction, wrong trajectories could come from the low sampling rate and bad GPS signals. Despite the fact that much work has been done in this area, the problem remains challenging because real-life noise is complicated to model and usually little knowledge of the ground truth is available. On the other hand, in many situations, assuming that the data presumably samples from a hidden space called ground truth, different types of noise such as Gaussian and/or ambient noise can be associated with it. For all types of noise, signals should prevail in density, which means that the data density should be higher near the ground truth. My work deals with such noisy data in two contexts. In the first scenario, we consider eliminating noise from a point cloud data sampled from a hidden ground truth K in a metric space. General denoising methods such as deconvolution and thresholding require the user to choose parameters and noise models. We first give a denoising algorithm with one parameter and assume a very general sampling condition. We provide the theoretical guarantee for this algorithm and argue that the one parameter cannot be avoided. We then propose a parameter-free denoising algorithm with a sampling condition that is slightly stronger. We show our method performs well on noisy uniform/adaptive point clouds by experiments on a 2D density field, 3D models, and handwritten digits. In the second scenario, we consider reconstructing a hidden graph from a noisy sample. Recently, a method based on Discrete Morse theory and persistent homology finds its applications in multiple areas, for example, reconstructing road networks from GPS trajectories and extracting filamentary structures from cosmology data. However, little theoretical analysis exists for the Discrete (open full item for complete abstract)

    Committee: Tamal Dey (Advisor); Yusu Wang (Advisor); Han-Wei Shen (Committee Member) Subjects: Computer Science
  • 20. Siegrist, Kyle Diagnostic Analysis of Postural Data using Topological Data Analysis

    Master of Science, Miami University, 2019, Mechanical and Manufacturing Engineering

    Understanding the mechanisms behind human balance has been a prominent subject of interest as various postural instabilities have been linked to neuromuscular impairments and diseases (Parkinson's, multiple sclerosis, and concussions). This thesis presents an unsupervised method that breaks any physical model, given a sampled manifold, into its local system behaviors, and allows for the detection of intermittent instabilities in an individual's postural stance via a co-activation topological mapping scheme. The proposed co-activation topological mapping scheme is extremely powerful because it not only allows for a diagnostic analysis of an individual's postural stance through locating the mechanisms responsible for their intermittent instabilities, but for subjects who are not exhibiting intermittent instabilities. This topological mapping method is able to differentiate between stable individual's postural data.

    Committee: James Chagdes Dr (Advisor); Shukla Amit Dr (Committee Member); Kramer Ryan Dr (Committee Member) Subjects: Biomechanics; Mechanical Engineering