Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 32)

Mini-Tools

 
 

Search Report

  • 1. Wang, Dingkang Understanding Noise and Structure behind Metric Spaces

    Doctor of Philosophy, The Ohio State University, 2021, Computer Science and Engineering

    Metric embedding is the representation of (high-dimensional) input data in a simple metric space, e.g., low-dimensional Euclidean space or tree space. Metric embedding reveals the underlying structure behind data and facilitates further analyses. It is important in many application areas, such as machine learning, computational biology, and genetics. Most of the recent studies focus on embedding with the smallest possible distortion. Despite the significant amount of work, many challenging problems remain. In my Ph.D. thesis, I study three directions to broaden the scope of metric embedding for its usage in practice. In the first work, I study metric embedding with outliers. Given some metric space $\mathcal{X} = (X, \rho)$, the goal is to find a small set of outlier points $K \subset X$ and a low-distortion embedding of $(X \backslash K, \rho)$ into some fixed dimensional Euclidean space. This is a natural problem that captures scenarios where a small fraction of points in the input are noise. I consider embeddings with small $\ell_{\infty}$ distortion and present a polynomial-time bi-criteria approximation algorithm. The algorithm has been applied on MNIST data set and 3D object data set. In the second work, I consider the problem of ordinal consensus with multiple metrics. Given $k$ metrics on the same point set $X$ of $n$ points, I aim to find a minimum subset $S$ such that the metrics restricted on $X \backslash S$ have ``consistent'' orders. This problem is important because metrics usually are not comparable and should not be directly combined. Instead, orders of pairwise distances carry sufficient information for metric embedding, and is independent of which scale the metric lies in. To this end, I propose two ways to measure the ordinal consistency of multiple metrics, called strong and weak consistency, respectively. I derive a range of hardness results of finding an optimal subset under both definitions with different types of input metrics. I also (open full item for complete abstract)

    Committee: Rephael Wenger (Advisor) Subjects: Computer Science
  • 2. Shanmugam Sakthivadivel, Saravanakumar Fast-NetMF: Graph Embedding Generation on Single GPU and Multi-core CPUs with NetMF

    Master of Science, The Ohio State University, 2019, Computer Science and Engineering

    There is growing interest for learning representations for nodes in a network. Several embedding generation algorithms have been proposed in the last few years that generate high quality representations for downstream tasks like node classification and link prediction. NetMF is one such algorithm that provides the theoretical foundations for proving that several network representation learning techniques implicitly factorize a closed form matrix derived from the graph. However, the NetMF algorithm is slow and does not scale well, owing to the multiple dense matrix multiplication steps and Singular Value Decomposition (SVD). We present Fast-NetMF, a fast, highly scalable version of the NetMF algorithm with reduced running time. In this work, we investigate the acceleration of NetMF under single-GPU and multi-core CPU settings. We also investigate replacing the slow SVD based matrix factorization step for faster and more parallel-friendly factorization techniques like Non-negative Matrix Factorization (NMF).

    Committee: Srinivasan Parthasarathy (Advisor); Sadayappan P (Committee Member) Subjects: Computer Engineering; Computer Science
  • 3. Jang, Haedong NONLINEAR EMBEDDING FOR HIGH EFFICIENCY RF POWER AMPLIFIER DESIGN AND APPLICATION TO GENERALIZED ASYMMETRIC DOHERTY AMPLIFIERS

    Doctor of Philosophy, The Ohio State University, 2014, Electrical and Computer Engineering

    A fully model-based nonlinear embedding device model including low and high-frequency dispersion effects is implemented for the Angelov device model and successfully demonstrated for load modulation power amplifier (PA) applications. Using this nonlinear embedding device model, any desired PA mode of operation at the current source plane can be projected to the external reference planes to synthesize the required multi-harmonic source and load terminations. A 2D identification of the intrinsic PA operation modes is performed first at the current source reference planes. For intrinsic modes defined without lossy parasitics, most of the required source impedance terminations will exhibit a substantial negative resistance after projection to the external reference planes. These terminations can then be implemented by active harmonic injection at the input. It is verified experimentally for a 15 W GaN HEMT class AB mode that using the second harmonic injection synthesized by the embedding device model at the input, yields an improved drain efficiency of up to 5% in agreement with the simulation. An asymmetric Doherty amplifier was built using two 15Wpeak power packaged GaN transistors for the demonstration of the proposed method. 71% drain efficiency at the peak power of 41.8 dBm and 62.7% at the second peak of 32.8 dBm (9 dB back-off) were observed. Above 50% drain efficiency was maintained over the 11 dB output power range. 51.86% average drain efficiency was observed after linearization maintaining -51.46 dBc adjacent channel power ratio excited by 10 MHz bandwidth long term evolution signals with 9.96 dB peak to average power ratio. A novel procedure was introduced for designing Doherty amplifiers using the model based nonlinear-embedding technique. First, the Doherty intrinsic load matching network is designed at the transistor current-source reference-plane with the main and auxiliary devices interconnected. Identical devices with different biasing are used (open full item for complete abstract)

    Committee: Patrick Roblin (Advisor); Roberto Rojas-Teran (Committee Member); Steven Bibyk (Committee Member); Christopher Hadad (Committee Member) Subjects: Electrical Engineering
  • 4. Li, Mengzhen Graph Representation Learning for Integrated Analysis of Biological Networks

    Doctor of Philosophy, Case Western Reserve University, 2024, EECS - Computer and Information Sciences

    With the rapid development of biotechnology that generate complex biological data, as well as graph machine learning algorithms to analyze these data, network-based analyses are becoming popular in modern biological research. Many network-based methods have been proposed to process biological data. Network embedding, which learns a representation of nodes into a low-dimensional space, has been a new learning paradigm in the studies of network analysis. Graph Representation learning algorithms are being commonly applied to a broad range of prediction tasks in systems biology. Mapping biological networks into a low-dimensional space enables efective application of machine learning methods in the downstream tasks. In this dissertation we present four new algorithms to compare and integrate biological networks, with a view to improving the reliability of graph machine learning algorithms on biological networks. These algorithms address inter-related problems to provide a comprehensive framework for network integration. Many real-world networks that are used in machine learning have multiple versions that come from diferent sources. For such networks, computation of Consensus Embeddings based on the node embeddings of individual versions can be useful for various reasons. GraphCan is a framework for computing canonical representations for biological networks using a similarity-based Graph Convolutional Network, and it integrates multiple node similarity measures to compute canonical node embeddings for a given network. Consensus embedding is used in our GraphCan model to integrate multiple node similarity measures to compute canonical node embeddings for a given network. GraphCan can be applied to diferent types of downstream tasks. BiGraphCan is an extension of GraphCan, and it aims to make bipartite predictions for understudied proteins using similarity networks and bipartite networks. We also explore network alignment problem by generalizing the Gromov-Wasserstein a (open full item for complete abstract)

    Committee: Mehmet Koyuturk (Advisor); Rong Xu (Committee Member); Yinghui Wu (Committee Member); Jing Li (Committee Member) Subjects: Computer Science
  • 5. Dave, Brandon Understanding Impact of Graph Structure on Knowledge Graph Embedding

    Master of Science (MS), Wright State University, 2024, Computer Science

    The effectiveness of a deployed knowledge graph is commonly evaluated with defined use-cases from domain experts. This poses challenges during the development cycle in determining how to represent data. Developers of a knowledge graph can optionally include semantics into a knowledge graph by abstracting the data representation in such a way that mirrors information as it exists in the real world. Consequently, the abstraction is represented by additional layers, resulting in performant differences in knowledge graph embedding; such as, the embedded model's ability to infer facts between entities through link predictions. This thesis presents a comprehensive analysis of the performance impact observed across a range of knowledge graph embedding models trained on FB15k-237, a widely recognized benchmark dataset for knowledge graph completion. Additionally, the experiment is performed with augmented versions of FB15k-237, serving to introduce semantics into the knowledge graph.

    Committee: Cogan Shimizu Ph.D. (Advisor); Wen Zhang Ph.D. (Committee Member); Lingwei Chen Ph.D. (Committee Member) Subjects: Computer Science
  • 6. Gong, Pingzhu Design of a Broadband Doherty Power Amplifier with a Graphical User Interface Tool

    Master of Science, The Ohio State University, 2022, Electrical and Computer Engineering

    A novel broadband Doherty Power Amplifier (DPA) design for 4G applications is presented with the aid of a modified version of DPA Graphical User Interface (GUI) Tool. Compared to the previous version, the new GUI uses Output Back-Off (OBO) as the input instead of the peak power ratio of the DPA Main and auxiliary PAs. Furthermore, the load impedance seen by the Main PA (MPA) at 2nd harmonic and 3rd harmonic can be directly controlled from the GUI. In this design, Main PA is operated under class F mode and the auxiliary PA is operated under Class C mode. The main PA is biased at -3.2 V at the gate (lower than typical values: -2.7 V ~ -3 V) to boost the efficiency. A new parasitic network with simple structure is extracted using OSU Embedding Model. Design procedures are stated as follows. Fundamental load impedances seen by both PAs are generated by GUI at Current source Reference Plane (CRP). Then, the Output Matching Network (OMN) is synthesized at CRP for both PAs separately. At last, the input matching network and the equal split Wilkinson power divider are designed to realize the broadband DPA. Electromagnetic (EM) Co-simulation result shows the DPA achieves 59.4-62.2% drain efficiency at 8-dB OBO at 1.5-2.2GHz with a gain of 10-13dB. The peak power range is between 43.7 dBm and 44.5 dBm. The THD and IM3 is around -15dB at back off power. The performance of the proposed DPA iii design is shown to be comparable to state-of-the-art designs, which indicates our DPA has a good performance.

    Committee: Patrick Roblin (Advisor); Wladimiro Villarroel (Committee Member) Subjects: Electrical Engineering
  • 7. Wang, Qingsong The Persistent Topology of Geometric Filtrations

    Doctor of Philosophy, The Ohio State University, 2022, Mathematics

    We study the theoretical foundation of the persistent topology of the geometric filtrations in Topological Data Analysis (TDA), such as Vietoris--Rips simplicial complexes, Vietoris--Rips metric thickenings. We introduce a $\ell_p$-relaxation to the Vietoris--Rips metric thickening where $p=\infty$ recovers the usual Vietoris--Rips metric thickening. We prove a stability theorem for the persistent homology of $\ell_p$ relaxed metric thickenings, which is novel even in the case $p=\infty$. The stability theorem then can be employed to show that the filtrations by Vietoris--Rips simplicial complexes and Vietoris--Rips metric thickenings have the same persistent diagram. Therefore, we can employ measure-theoretical methods to study the Vietoris--Rips complex. Some recent study also suggests that the persistent homology of Vietoris--Rips simplicial complex changes when the scale passes the diameter of some extremal configuration of the diameter functional. As an example, we study the extremal configurations on spheres. We implemented the diameter gradient flow and obtained nontrivial extremal configurations on $\Sp^2$ and $\Sp^3$. We find a natural condition for metric spaces that will guarantee the vanishing of the persistence diagram of Vietoris--Rips filtration over certain dimensions. We also demonstrate by a non-collapsing result that the persistent features can be utilized to obtain a quantitive lower bound for the Gromov--Hausdorff distance between Riemannian manifolds.

    Committee: Facundo Mémoli (Advisor); Jean-François Lafont (Committee Member); Matthew Kahle (Committee Member) Subjects: Mathematics
  • 8. Dozier, Robbie Navigating the Metric Zoo: Towards a More Coherent Model For Quantitative Evaluation of Generative ML Models

    Master of Sciences, Case Western Reserve University, 2022, EECS - Computer and Information Sciences

    This thesis studies a family of high-dimensional generative procedures modeled by Deep Generative Models (DGMs). These models can sample from complex manifolds to create realistic images, video, audio, and more. In prior work, generative models were evaluated using likelihood criteria. However, likelihood has been shown to suffer from the Curse of Dimensionality, and some generative architectures such as Generative Adversarial Networks (GANs) do not admit a likelihood measure. While some other metrics for GANs have been proposed in the literature, there has not been a systematic study and comparison between them. In this thesis I conduct the first comprehensive empirical analysis of these generative metrics, comparing them across several axes including sample quality, diversity, and computational efficiency. Second, I propose a new metric which employs the concept of typicality from information theory and compare it to existing metrics. My work can be used to answer questions about when to use which kind of metric when training DGMs.

    Committee: Soumya Ray (Advisor); Michael Lewicki (Committee Member); Harold Connamacher (Committee Member) Subjects: Artificial Intelligence; Computer Science
  • 9. Mei, Mei A Framework for the Discovery and Tracking of Ideas in Longitudinal Text Corpora

    PhD, University of Cincinnati, 2022, Engineering and Applied Science: Computer Science and Engineering

    The emergence and evolution of ideas is one of the most important processes in human society, and has been a topic of great interest for philosophers and historians. Psychologists have also attempted to develop models of how new ideas arise from the recombination of existing ones, and have proposed to model this process as being similar to biological evolution. However, studying the evolution of ideas has been limited by the difficulty in obtaining systematic data. The recent exponential growth in electronic data promises a solution, but several impediments remain, including a systematic process for extracting ideas and methods for analyzing their dynamics over time. While the general problem of identifying ideas in texts is extremely complex, one possible approach is to look at how meaning is distributed in documents, and to study the evolution of this structure across documents and over time. The research in this dissertation develops a framework for doing this in large, longitudinal corpora of documents. This system, called the Framework for the Analysis of Semantic Structure Evolution in Text (FASSET), exploits the statistics of changing word usage within the corpus in combination with machine learning techniques, including topic analysis, semantic embedding, adaptive clustering, and dimensionality reduction for visualization. It represents an integrated model for extracting semantic structure within single documents, within text corpora with a single time-stamp, and across a longitudinally extensive corpus. It includes new methods for text segmentation, longitudinal topic identification, and longitudinal semantic clustering. The goal is to provide a system for exploring a simplified "systems biology" of ideas through which the evolution of ideas can be studied at various levels. The FASSET system is applied to two large longitudinal corpora: Speeches in the U.S. Congress over a period of 36 consecutive years, and papers presented at the Int (open full item for complete abstract)

    Committee: Ali Minai Ph.D. (Committee Member); Gowtham Atluri Ph.D. (Committee Member); Carla Purdy Ph.D. (Committee Member); Simona Doboli Ph.D. (Committee Member); Raj Bhatnagar Ph.D. (Committee Member) Subjects: Computer Science
  • 10. Groeger, Alexander Texture-Driven Image Clustering in Laser Powder Bed Fusion

    Master of Science (MS), Wright State University, 2021, Computer Science

    The additive manufacturing (AM) field is striving to identify anomalies in laser powder bed fusion (LPBF) using multi-sensor in-process monitoring paired with machine learning (ML). In-process monitoring can reveal the presence of anomalies but creating a ML classifier requires labeled data. The present work approaches this problem by printing hundreds of Inconel-718 coupons with different processing parameters to capture a wide range of process monitoring imagery with multiple sensor types. Afterwards, the process monitoring images are encoded into feature vectors and clustered to isolate groups in each sensor modality. Four texture representations were learned by training two convolutional neural network texture classifiers on two general texture datasets for clustering comparison. The results demonstrate unsupervised texture-driven clustering can isolate roughness categories and process anomalies in each sensor modality. These groups can be labeled by a field expert and potentially be used for defect characterization in process monitoring.

    Committee: Tanvi Banerjee Ph.D. (Advisor); Thomas Wischgoll Ph.D. (Committee Member); John Middendorf Ph.D. (Committee Member) Subjects: Computer Science; Materials Science
  • 11. Choudhary, Rishabh Construction and Visualization of Semantic Spaces for Domain-Specific Text Corpora

    MS, University of Cincinnati, 2021, Engineering and Applied Science: Electrical Engineering

    An important objective in Natural Language Processing is representing pieces of text in numerical representations through the process of text embedding. Recent language models and text encoders have proved successful in generating high quality embeddings that perform well on tasks such as sentiment analysis, question and answer response, and summarization. Many of these models are available pre-trained on enormous amounts of data, providing downstream applications with general-purpose semantic spaces. A useful application of text embeddings is creating a semantic space on a specific topic based on a specialized dataset. This semantic space can be used to track the trajectory of a piece of text to see where the “train of thought” is going. In this type of application, the performance of embeddings on down-stream tasks is not as important as the relationship between the embeddings themselves. Specifically, it is important for semantically similar units of text to have embeddings that are close to each other. Most text embedding methods produce text embeddings in high-dimensional spaces, with a dimensionality ranging from a few hundred to thousands. However, it is often useful to visualize semantic spaces in very low dimension, which requires the use of dimensionality reduction methods. It is not clear what language models and what method of dimensionality reduction would work well in these cases. This thesis provides a method of evaluating combinations of embedding methods and dimensionality reduction methods. Using the results from this analysis, a method of creating a cognitive map from a small and specialized dataset is implemented and evaluated.

    Committee: Ali Minai Ph.D. (Committee Chair); Raj Bhatnagar Ph.D. (Committee Member); Yizong Cheng Ph.D. (Committee Member); Simona Doboli Ph.D. (Committee Member) Subjects: Artificial Intelligence
  • 12. Chennupati, Nikhil Recommending Collaborations Using Link Prediction

    Master of Science (MS), Wright State University, 2021, Computer Science

    Link prediction in the domain of scientific collaborative networks refers to exploring and determining whether a connection between two entities in an academic network may emerge in the future. This study aims to analyze the relevance of academic collaborations and identify the factors that drive co-author relationships in a heterogeneous bibliographic network. Using topological, semantic, and graph representation learning techniques, we measure the authors' similarities w.r.t their structural and publication data to identify the reasons that promote co-authorships. Experimental results show that the proposed approach successfully infer the co-author links by identifying authors with similar research interests. Such a system can be used to recommend potential collaborations among the authors.

    Committee: Tanvi Banerjee Ph.D. (Advisor); Krishnaprasad Thirunarayan Ph.D. (Committee Member); Michael L. Raymer Ph.D. (Committee Member) Subjects: Artificial Intelligence; Computer Science
  • 13. Tallo, Philip Using Sentence Embeddings for Word Sense Induction

    MS, University of Cincinnati, 2020, Engineering and Applied Science: Computer Science

    One of the primary goals of the field of Natural Language Processing is to create very high-quality text embeddings which can be used in many domains. The main area which text embedding methods typically fall short is in handling polysemy detection. A word is polysemous when it has multiple meanings (e.g. the word bank when used in a financial context versus an ecological context). Current text embedding methods fail to handle this at all, training just one embedding for all meanings of a word. Discovering methods for handling polysemy detection is an active area of research. This thesis presents a Word Sense Induction (WSI) system which is based on the hypothesis that by clustering sentence embeddings it is possible to achieve a clustering over sense embeddings as well. Subsequently, this hypothesis this thesis uses the SemEval 2010 benchmark to test the Sentence based WSI (S-WSI) methodology and compare it with state-of-the- art methods in the field. This benchmark is based on four key metrics: homogeneity, completeness, precision, and recall. The key advantages of the approach proposed in this thesis compared to other methods is adaptability. This S-WSI methodology can use any sentence embedding model or clustering method making it highly adaptable to the user's domain specific needs. This method is highly dependent on the sentence embedding model which is being used with some models achieving near SOTA performance whereas some models only performing slightly better than pure random.

    Committee: Ali Minai Ph.D. (Committee Chair); Raj Bhatnagar Ph.D. (Committee Member); Anca Ralescu Ph.D. (Committee Member) Subjects: Computer Science
  • 14. Chang, Hsiu-Chen New Mixed-Mode Chireix Outphasing Theory and Frequency-Agile Clockwise-Loaded Class-J Theory for High Efficiency Power Amplifiers

    Doctor of Philosophy, The Ohio State University, 2020, Electrical and Computer Engineering

    A new design methodology providing optimal mixed-mode operation for dual-input class-F outphasing Chireix amplifiers is presented. The design starts with single-transistor class-F simulations at the intrinsic I-V reference planes to directly select the optimal peak and backoff resistive loads Rmin and Rmax and input RF gate drives yielding the best combination of efficiencies and output powers without needing to perform a load pull simulation or measurement. New analytic equations expressed only in terms of Rmin and Rmax are given for designing the Chireix combiner at the current source reference planes. Nonlinear embedding is then used to predict the incident power and multi-harmonic source and load impedances required at the package reference planes to physically implement the power amplifier (PA). An analytic formula solely expressed in terms of Rmin and Rmax is reported for the peak and backoff outphasing angles required at the PA input reference planes. A Chireix outphasing PA designed with two 15-W GaN HEMTs exhibits a peak efficiency of 72.58% with peak power of 43.97 dBm and a 8-dB backoff efficiency of 75.22% at 1.9 GHz. Measurements with 10-MHz LTE signals with 9.6-dB PAPR yield 59.4% average drain efficiency at 1.9 GHz while satisfying the 3GPP linearity requirements. A novel frequency-agile PA designed with a modified class-J theory enforcing constant maximum and minimum instantaneous drain voltages for all frequencies is presented. The resulting high efficiency class-J mode which requires a reconfigurable drain supply exhibits clockwise fundamental and second harmonic load impedance trajectories versus frequency facilitating the PA design. This clockwise-loaded class-J (CLCJ) mode enables frequency-agile capability with enhanced efficiency when the proper drain supply voltage co-designed with the clockwise fundamental and harmonic loads is applied. A broadband power amplifier designed with a clockwise-loaded class-J theory is selected for demo (open full item for complete abstract)

    Committee: Patrick Roblin (Advisor); Ayman Fayed (Committee Member); Waleed Khalil (Committee Member) Subjects: Electrical Engineering
  • 15. Chen, Huiyuan Dimension Reduction for Network Analysis with an Application to Drug Discovery

    Doctor of Philosophy, Case Western Reserve University, 2020, EECS - Computer and Information Sciences

    Graphs (or networks) naturally represent valuable information for relational data, which are ubiquitous in real-world applications, such as social networks, recommender systems, and biological networks. Statistical learning or machine learning techniques for network analysis, such as random walk with restart, meta-path analysis, network embeddings, and matrix/tensor factorizations, have gained tremendous attentions recently. With rapid growth of data, networks, either homogeneous or heterogeneous, can consist of billions of nodes and edges. How can we find underlying structures within a network? How can we efficiently manage data when multiple sources describing the networks are available? How can we detect the most important relationships among nodes? To gain insights into these problems, this dissertation investigates the principles and methodologies of dimension reduction techniques that explore the useful latent structures of one or more networks. Our dimension reduction techniques mainly leverage recent developments in linear algebra, graph theory, large-scale optimization, and deep learning. In addition, we also translate our ideas and models to several real-world applications, especially in drug repositioning, drug combinations, and drug-target-disease interactions. For each research problem, we discuss their current challenges, related work, and propose corresponding solutions.

    Committee: Jing Li Dr. (Committee Chair); Harold Connamacher Dr. (Committee Member); Xusheng Xiao Dr. (Committee Member); Satya Sahoo Dr. (Committee Member) Subjects: Computer Science
  • 16. Zhu, Xiaoting Systematic Assessment of Structural Features-Based Graph Embedding Methods with Application to Biomedical Networks

    PhD, University of Cincinnati, 2020, Engineering and Applied Science: Computer Science and Engineering

    Graphs arise naturally in many complex systems where they are used to represent entities and relationships between them. The analysis of graph-based models has wide applications like evaluating the significance of interactions between individual entities, identifying important subcomponents, discovering hidden interactions, and making complex inferences about the functions of the underlying systems. Many of these applications require meaningful representation of nodes, and several graph-embedding algorithms have recently been developed to embed nodes in meaningful vector spaces. However, it is not clear how the performance of these algorithms depends on the structural features of graphs, which can vary a lot across real world domains. It would thus be useful to identify the main features that influence the performance of embedding approaches, and to develop a method that can determine the most suitable method for any given graph. The research described in this dissertation applies a systematic approach to comparing various graph-embedding methods on several types of graphs, relates their performance to the structural features of the graphs, and develops a system to select the best embedding method based on graph features. By evaluating the node embedding algorithms for link prediction on several synthetic graph models and real-world network datasets, this study demonstrates the fact that the structural properties of a graph have a significant effect on how well any given node embedding algorithm performs on it. For a particular graph, the performance of a node embedding algorithm can be predicted based on the structural properties, and this relationship holds across a wide range of network types and real-world networks. The results in this dissertation lead to several insights about which algorithms work for various types of graphs.

    Committee: Ali Minai Ph.D. (Committee Chair); Raj Bhatnagar Ph.D. (Committee Member); Yizong Cheng Ph.D. (Committee Member); Jaroslaw Meller Ph.D. (Committee Member); Carla Purdy Ph.D. (Committee Member) Subjects: Computer Science
  • 17. Ngwobia, Sunday Capturing Knowledge of Emerging Entities from the Extended Search Snippets

    Master of Computer Science (M.C.S.), University of Dayton, 2019, Computer Science

    Google and other search engines feature the entity search by representing a knowledge card summarizing related facts about the user-supplied entity. However, the knowledge card is limited to certain entities which have a Wiki page or an entry in encyclopedias such as Freebase. The current encyclopedias are limited to highly popular entities which are far fewer compared with the emerging entities. Despite the availability of knowledge about the emerging entities on the search results, yet there are no approaches to capture, abstract, summarize, fuse, and validate fragmented pieces of knowledge about them. Thus, in this paper, we develop approaches to capture two types of knowledge about the emerging entities from a corpus extended from top-n search snippets of a given emerging entity. The first kind of knowledge identifies the role(s) of the emerging entity as, e.g., who is s/he? The second kind captures the entities closely associated with the emerging entity. As the testbed, we considered a collection of 20 emerging entities and 20 popular entities as the ground truth. Our approach is an unsupervised approach based on text analysis and entity embeddings. Our experimental studies show promising results as the accuracy of more than 87% for recognizing entities and 75% for ranking them. Beside 87% of the entailed types were recognizable. Our testbed and source codes are available on Github (https://github.com/sunnyUD/research_source_code).

    Committee: Saeedeh Shekarpour Ph.D (Committee Chair); Ju Shen Ph.D (Committee Member); Zhongmei Yao Ph.D (Committee Member); Tam Nguyen Ph.D (Committee Member); James Buckley Ph.D (Advisor) Subjects: Computer Science; Information Systems
  • 18. Sun, Jiankai Directed Graph Analysis: Algorithms and Applications

    Doctor of Philosophy, The Ohio State University, 2019, Computer Science and Engineering

    Taxonomy graphs that capture hyponymy or meronymy relationships through directed edges are expected to be acyclic. However, in practice, they may have thousands of cycles, as they are often created in a crowd-sourced way. Since these cycles represent logical fallacies, they need to be removed for many web applications. In this thesis, we first address the problem of breaking cycles while preserving the logical structure (hierarchy) of a directed graph as much as possible. Existing approaches for this problem either need manual intervention or use heuristics that can critically alter the taxonomy structure. In contrast, our approach infers graph hierarchy using a range of features, including a Bayesian skill rating system and a social agony metric. We also devise several strategies to leverage the inferred hierarchy for removing a small subset of edges to make the graph acyclic. We then apply our breaking cycles technique to address the problem of Question Difficulty and Expertise Estimation (QDEE) in Community Question Answer (CQA) sites such as Yahoo! Answers and Stack Overflow. Our framework QDEE tackles a fundamental challenge in crowdsourcing: how to appropriately route and assign questions to users with suitable expertise. This problem domain has been the subject of much research and includes both language-agnostic as well as language conscious solutions. We bring to bear a key language-agnostic insight: that users gain expertise and therefore tend to ask as well as answer more difficult questions over time. We use this insight within the popular competition (directed) graph model to estimate question difficulty and user expertise by identifying key hierarchical structure within the said model. Difficulty levels of newly posted questions (the cold-start problem) are estimated by using our QDEE framework and additional textual features. We also propose a model to route newly posted questions to appropriate users based on the difficulty level of the question (open full item for complete abstract)

    Committee: Srinivasan Parthasarathy (Advisor); Huan Sun (Committee Member); Eric Fosler-Lussier (Committee Member); Darren Drewry (Committee Member) Subjects: Computer Engineering; Computer Science
  • 19. Moon, Gordon Parallel Algorithms for Machine Learning

    Doctor of Philosophy, The Ohio State University, 2019, Computer Science and Engineering

    Machine learning is becoming an integral part of everyday life. Therefore, development of a high performance genre of machine learning algorithms is becoming increasingly significant from the perspectives of performance, efficiency, and optimization. The current solution is to use machine learning frameworks such as TensorFlow, PyTorch and CNTK, which enable us to utilize specialized architectures such as multi-core CPUs, GPUs, TPUs and FPGAs. However, many machine learning frameworks facilitate high productivity, but are not designed for high performance. There is a significant gap in the performance achievable by these frameworks and the peak compute capability of the current architectures. In order for machine learning algorithms to be accelerated for large-scale data, it is essential to develop architecture-aware machine learning algorithms. Since many machine learning algorithms are very computationally demanding, parallelization has garnered considerable interest. In order to achieve high performance, data locality optimization is extremely critical, since the cost of data movement from memory is significantly higher than the cost of performing arithmetic/logic operations on current processors. However, the design and implementation of new algorithms in machine learning has been largely driven by a focus on computational complexity. In this dissertation, the parallelization of three extensively used machine learning algorithms, Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), and Word2Vec, is addressed by a focus on minimizing the data movement overhead through the memory hierarchy, using techniques such as 2D-tiling and rearrangement of data computation. While developing each parallel algorithm, a systematic analysis of data access patterns and data movements of the algorithm is performed and suitable algorithmic adaptations and parallelization strategies are developed for both multi-core CPU and GPU platforms. Experimental resul (open full item for complete abstract)

    Committee: P. Sadayappan (Advisor); Srinivasan Parthasarathy (Committee Member); Eric Fosler-Lussier (Committee Member) Subjects: Computer Science
  • 20. Zha, Xiao Topological Data Analysis on Road Network Data

    Master of Mathematical Sciences, The Ohio State University, 2019, Mathematical Sciences

    Many problems in science and engineering involve signal analysis. Engineers and scientists came up with many approaches to study signals. Recently, researchers propose a new frame- work, combining the time-delay embedding with the tools from computational topology, for the study of periodic signals. By applying time-delay embedding to the periodic signals, the periodic behaviors express themselves as topological cycles and we can use persistent homol- ogy to detect these topological features. In this thesis, we apply this method to analyze road network data, specifically vehicle flow data recorded by detectors placed on highways. First, we apply time-delay embedding to project the vehicle flow data into point cloud data in a high dimensional space. Then, we use persistent homology tools to detect the topological features and get persistence digram. Next, we can repeat the same experiment to vehicle flow data of different period. Fox example, in our experiment, we use the vehicle flow data of different weeks and months. Therefore, we get persistence diagrams corresponding to the vehicle flow data of different period. Finally, we calculate the bottleneck distance and wasserstein distance between these persistence diagrams and do hierarchical clustering. The dendrograms of the hierarchical clustering show us the patterns behind these vehicle flow data.

    Committee: Facundo Mémoli (Advisor); Yusu Wang (Advisor) Subjects: Mathematics