Search Results (1 - 21 of 21 Results)

Sort By  
Sort Dir
 
Results per page  

Tasan, MuratDistance-Based Indexing: Observations, Applications, and Improvements
Doctor of Philosophy, Case Western Reserve University, 2006, Computing and Information Science
Multidimensional indexing has long been an active research problem in computer science. Most solutions involve the mapping of complex data types to high-dimensional vectors of fixed length and applying either Spatial Access Methods (SAMs) or Point Access Methods (PAMs) to to the vectorized data. In more recent times, however, this approach has found its limitations. Much of the current data is either difficult to map to a fixed-length vector (such as arbitrary length strings), or maps only successfully to a very high number of dimensions. In both cases, Distance-Based Indexing serves as an attractive alternative, relying only on the pairwise distance information of data items to build indices that offer efficient similarity search retrieval. In this work, distance-based indexing is approached first in a general fashion, where a framework is laid out that encompasses both distance-based indexing methods as well as SAMs and PAMs. Shared properties of various seemingly unrelated data structures can be exploited, as is shown by the presentation of a single (and optimal) search algorithm that works on a variety of trees for a variety of different search types. The motivation for distance-based indexing is then shown via an application of indexing strings (biological sequences, to be exact). By simply showing that a distance function satisfies the properties of a metric, it is illustrated that many forms of data, with various distribution characteristics can successfully be indexed with distance-based indexing. Finally, a probabilistic approach towards indexing leads to an improved tree construction algorithm, as well as an information based search algorithm that searches the information stored in any data structure, regardless of the form (i.e. whether the structure is a tree or a matrix, the algorithm performs equally well).

Committee:

Z. Meral Ozsoyoglu (Advisor)

Subjects:

Computer Science

Keywords:

distance-based indexing; probabilistic indexing; data structures

Bobik, SergeiEDGE-SUPPRESSED COLOR IMAGE INDEXING AND RETRIEVAL USING ANGLE-DISTANCE MEASUREMENT IN THE SCALED-SPACE OF PRINCIPAL COMPONENTS
Master of Science (MS), Ohio University, 2000, Electrical Engineering & Computer Science (Engineering and Technology)

Edge-suppressed color image indexing and retrieval using angle-distance measurement in the scaled-space of principal components

Committee:

Mehmet Celenk (Advisor)

Keywords:

image indexing; color indexing; angle-distance measurement; scaled-space; principal components

LI, WEIHIERARCHICAL SUMMARIZATION OF VIDEO DATA
MS, University of Cincinnati, 2007, Engineering : Computer Science
Digital video content is appearing in many new applications. There is a need for efficient video indexing, quick browsing, and easy retrieval of video content of interest in video clips. In this thesis, we analyze video structure and use a hierarchical summarization strategy to get a multilevel video summary. Keyframes from video are extracted from each video shot by comparing the similarity between frames. Using an affinity matrix which lists the similarities of every pairs of keyframes, similar keyframes are clustered and adjacent shots within a specified distance are merged into groups. These groups are organized into scenes that are independent story units. From scenes representative frames are selected to construct a hierarchy summary.

Committee:

Dr. Chia-Yung Han (Advisor)

Subjects:

Computer Science

Keywords:

Video data processing; Video summarization; Video content structure; video indexing

Meqdadi, Omar MohammedUNDERSTANDING AND IDENTIFYING LARGE-SCALE ADAPTIVE CHANGES FROM VERSION HISTORIES
PHD, Kent State University, 2013, College of Arts and Sciences / Department of Computer Science
A systematic study of the adaptive maintenance process is undertaken. The research aims to better understand how developers adapt and migrate systems in response to such things as large API changes. The ultimate goal is to support the construction of automated methods and tools to support the adaptive maintenance process. The main case study involves an exhaustive manual investigation of a number of open source systems (e.g., KOffice, Extragear/graphics, and OpenGL) during a time when a large adaptive maintenance task was taking place. In each case the adaptive maintenance task involved a substantial API migration (e.g., Qt3 to Qt4) that took place over multiple years. Additionally, the systems were also undergoing other modifications (perfective and corrective) such as bug fixing and the addition of new features. The main goal of the study was to identify and distinguish the adaptive maintenance changes from the other types of changes. These adaptive maintenance commits are then analyzed to identify common characteristics and trends. The analysis examines the amount of change taking place for each commit, the vocabulary of the commit messages, the authorship of the changes, and the stereotype of modified methods. The data provides a point of reference for the study of these types of changes. This is also the first published in-depth and systematic examination of large adaptive maintenance tasks. The results show that adaptive maintenance tasks involve a relatively few number of large changes. There are also few developers involved in this task and they use a somewhat standard vocabulary in describing the associated commits. This information is they used as a means to automatically identify adaptive changes. An information retrieval technique, namely Latent Semantic Analysis, is used to retrieve relevant adaptive commits when querying the commits available in the version control system. The approach was found to have good accuracy. Our results show that the approach accurately retrieves relevant adaptive commits, with nearly 90% recall. Additionally, a means to uncover a set of traceability links between source code files and other artifacts resulting from adaptive maintenance tasks is developed. The validation results show highly precision predictions using TraceLab components.

Committee:

Jonathan Maletic, Professor (Advisor)

Subjects:

Computer Science

Keywords:

Adaptive Maintenance; Commit; Software Engineering; Stereotype; Latent Semantic Indexing; Traceability; Version History; Topic Modeling

Landry, Bertrand ClovisA theory of indexing : indexing theory as a model for information storage and retrieval /
Doctor of Philosophy, The Ohio State University, 1971, Graduate School

Committee:

Not Provided (Other)

Subjects:

Computer Science

Keywords:

Indexing;Information storage and retrieval systems

Zhang, ShijieIndex-based Graph Querying and Matching in Large Graphs
Doctor of Philosophy, Case Western Reserve University, 2010, EECS - Computer and Information Sciences
Currently, a huge amount data can be naturally represented by graphs, e.g., protein interaction networks, gene regulatory networks, etc. The size of an application graph may vary from tens of vertices to millions of vertices. Rich information may be retrieved if proper tools are provided. We are interested inapplying index-based graph querying and matching techniques to both large and massive graphs. We use frequent subtrees for graph querying problem in a database composed of multiple small graphs. Subtree based indexing algorithms are efficient and effective in finding the supergraphs of any given query graph. For graph matching problem in a relatively large database graph, we proposed to use a distance based index structure. Optimized by a dynamic matching scheme, the algorithm can quickly find all the matches of any given query graph in the database graph. For graph matching in a massive database graph, we use a twofold index based on label combinations and shortest path trees. Last but not least, we discuss the future work of index-based graph querying and matching algorithms.

Committee:

Jiong Yang (Committee Chair)

Subjects:

Computer Science

Keywords:

graph indexing matching subgraph serach

Canahuate, Guadalupe M.Enhanced Bitmap Indexes for Large Scale Data Management
Doctor of Philosophy, The Ohio State University, 2009, Computer Science and Engineering
Advances in technology have enabled the production of massive volumes of data through observations and simulations in many application domains.These new data sets and the associated queries pose a new challenge for efficient storage and data retrieval that requires novel indexing structures and algorithms. We propose a series of enhancements to bitmap indexes to make them account for the inherent characteristics of large scale datasets and to efficiently support the type of queries needed to analyze the data. First, we formalize how missing data should be handled and how queries should be executed in the presence of missing data. Then, we propose an adaptive code ordering as a hybrid between Gray code and lexicographic orderings to reorganize the data and further reduce the size of the already compressed bitmaps. We address the inability of the compressed bitmaps to directly access a given row by proposing an approximate encoding that compresses the bitmap in a hash structure. We also extend the existing run-length encoders of bitmap indexes by adding an extra word to represent future rows with zeros and minimize the insertion overhead of new data. We propose a comprehensive framework to execute similarity searches over the bitmap indexes without changes to the current bitmap structure, without accessing the original data, and using a similarity function that is meaningful in high dimensional spaces. Finally, we propose a new encoding and query execution for non-clustered bitmap indexes that combines several attributes in one existence bitmap, reduces the storage requirement, and improves query execution and update time for low cardinality attributes.

Committee:

Ferhatosmanoglu Hakan (Advisor); Gagan Agrawal (Committee Member); P Sadayappan (Committee Member); Timothy Long (Committee Member)

Keywords:

bitmap index; scientific data management; large scale indexing

PONNAVAIKKO, KOVENDHANADVANCED INDEXING TECHNIQUES FOR FILE SHARING IN P2P NETWORKS
MS, University of Cincinnati, 2002, Engineering : Computer Science
File sharing has been the most popular service for which peer-to-peer (P2P) networks have been used in recent years and it is expected to remain so for a long time. A P2P file-sharing service makes each user's machine a peer in a network of peers and allows the users to share files. Users can issue queries to the network to find out the locations of the files of their interest. The average size of a P2P network is much larger than the average size of a client-server network. Each node in the network receives a lot of queries every second and so has to search through its file indices several times every second to obtain the results for the queries. The time taken by the nodes to respond to the queries can have a large impact on the overall performance of the network. Using proper indexing techniques can reduce the query response time significantly. This thesis work focuses on the study of advanced indexing techniques that can be used to index the filenames in P2P nodes that participate in a file-sharing service. As a case study, a special kind of P2P node, the supernode, was chosen. A supernode provides proxy and indexing services to nodes on slower network connections. Nodes connect to and disconnect from the supernode arbitrarily. So the problem of indexing filenames in the supernode becomes dynamic in nature. We consider two fundamentally different algorithmic models (the Merged Tree Model and the Vector Model) for dynamic indexing. The merged tree model is the model in which the indices obtained from the connecting nodes are combined with a single primary index maintained by the supernode. The vector model is the model in which the indices obtained from the connecting nodes are stored individually as a vector of indices. We provide a formal framework for analyzing the performance of the different models. Furthermore we use simulations to verify the formal framework and to determine precise constant factors. We conclude by demonstrating that a hybrid algorithm is optimal in terms of performance and suggest the parameters to optimization.

Committee:

Dr. Fred Annexstein (Advisor)

Subjects:

Computer Science

Keywords:

P2P networks; indexing; file sharing

Jia, YananGeneralized Bilinear Mixed-Effects Models for Multi-Indexed Multivariate Data
Doctor of Philosophy, The Ohio State University, 2016, Statistics

In this dissertation, I propose a novel framework for classifying and describing of multivariate data sets based on the number and structure of indexing variables. Using this framework, I then develop a Bayesian generalized bilinear mixed-effects model for multi-indexed multivariate data. I demonstrate this model is able to capture important features of affiliation network data.

The proposed novel framework, the multi-indexed multivariate data structure, classifies data into different categories based on the number and structure of indexing variables. I focus on two special cases of this type of data: double-indexed multivariate data with multiple-membership structure (termed Type A) and triple-indexed multivariate data with cross-classified structure (termed Type B). Several illustrative examples of data of these two types are provided.

Some existing multivariate statistical methods, which are appropriate for analyzing Type A or Type B data, are reviewed briefly. Methods are roughly classified into two categories according to their underlying purpose. One class of methods has the purpose of identifying patterns of dependence among the components of one variable Y via the indexing variables. This can be achieved by dimension reduction and graphical presentation. The other class of methods quantifying the linear relationship between one variable Y and other variables X, while accounting for the dependence among the components of Y. In both cases, the dependence patterns among the components of Y may be complicated and existing methods may not be suitable. I demonstrate existing methods are not able to capture fourth order dependence, which often is present in Type A and Type B data.

A new statistical methodology, based on the Bayesian generalized bilinear mixed-effects model, is developed for Type A and Type B data. This model can be viewed as a tool for achieving the two desiderata for multivariate statistical methods described above: identifying and accounting for dependence. The model allows us to identify dependence patterns among the components of Y via a bilinear term, an inner product of two latent variables corresponding to two indexing variables. It is also suitable for studying the relationship between Y and X through a regression form while accounting for the dependence among the components of Y caused by repeated measures and/or unexplained fourth order dependence. A Markov chain Monte Carlo algorithm is described for Bayesian inference. Data from the 2012 summer Olympic games is analyzed to illustrate the model.

The performance of the bilinear mixed-effects model-fitting algorithm is studied via the analysis of MCMC output arising from a series of simulation studies. The robustness of the methodology to model misspecification, particularly with respect to over-dispersion and the latent dimension of the bilinear random effects, are examined through these simulation studies.

Affiliation networks are a particular type of Type A data, which record the relationship between a set of 'actors' and a set of 'events'. The generalized bilinear mixed-effects model considers the dependence patterns resulting from interactions between actors and events. In this setting, the model is used to explore patterns in extracurricular activity membership of students in a racially diverse high school in a Midwestern metropolitan area while controlling for differences in participation by both activity characteristics and attributes of the students. Using techniques from spatial point pattern analysis, we show how our model can provide insight into patterns of racial segregation in the voluntary extracurricular activity participation profiles of adolescents. In addition, household travel pat- terns are examined through latent variables associated with the geographic area of residence and the destination area of observed trips. In this case, the bilinear model highlights common travel behaviors of individuals residing inside the I-270 beltway of Columbus, OH.

Committee:

Catherine Calder (Advisor); Eloise Kaizar (Committee Member); Steve MacEachern (Committee Member)

Subjects:

Sociology; Statistics

Keywords:

Multi-Indexed Multivariate Data Sets, Indexing Variables, Bayesian Generalized Bilinear Mixed-Effects Model

Aladesulu, Olorunfemi StephenImprovement of automatic indexing through recognition of semantically equivalent syntactically different phrases /
Doctor of Philosophy, The Ohio State University, 1985, Graduate School

Committee:

Not Provided (Other)

Subjects:

Computer Science

Keywords:

Automatic indexing;Recognition

Alhindawi, Nouh TalalSupporting Source Code Comprehension During Software Evolution and Maintenance
PHD, Kent State University, 2013, College of Arts and Sciences / Department of Computer Science
This dissertation addresses the problems of program comprehension to support the evolution of large-scale software systems. The research concerns how software engineers locate features and concepts along with categorizing changes within very large bodies of source code along with their versioned histories. More specifically, advanced Information Retrieval (IR) and Natural Language Processing (NLP) are utilized and enhanced to support various software engineering tasks. This research is not aimed at directly improving IR or NLP approaches; rather it is aimed at understanding how additional information can be leveraged to improve the final results. The work advances the field by investigating approaches to augment and re-document source code with different types of abstract behavior information. The hypothesis is that enriching the source code corpus with meaningful descriptive information, and integrating this orthogonal information (semantic and structural) that is extracted from source code, will improve the results of the IR methods for indexing and querying information. Moreover, adding this new information to a corpus is a form of supervision. That is, apriori knowledge is often used to direct and supervise machine-learning and IR approaches. The main contributions of this dissertation involve improving on the results of previous work in feature location and source code querying. The dissertation demonstrates that the addition of statically derived information from source code (e.g., method stereotypes) can improve the results of IR methods applied to the problem of feature location. Further contributions include showing the effects of eliminating certain textual information (comments and function calls) from being included when performing source code indexing for feature/concept location. Moreover, the dissertation demonstrates an IR-based method of natural language topic extraction that assists developers in gaining an overview of past maintenance activities based on software repository commits. The ultimate goal of this work is to reduce the costs, effort, and time of software maintenance by improving the results of previous work in feature location and source code querying, and by supporting a new platform for enhancing program comprehension and facilitating software engineering research.

Committee:

Jonathan Maletic, Professor (Advisor)

Subjects:

Computer Science

Keywords:

Software Comprehension; Software Evolution; Software Maintenance; Information Retrieval; Latent Semantic Indexing; Stereotypes; Traceability; Commits; Topic Modeling; Corpus; Feature Location; Concept Location; Source Code Quering

ABEYSINGHE, RUVINI PRADEEPASIGNATURE FILES FOR DOCUMENT MANAGEMENT
MS, University of Cincinnati, 2001, Engineering : Computer Science
A document management system, called SDMS (Signature-file Document Management System), has been designed and implemented. This system was designed based on the signature file method to store documents in a database file and to retrieve them according to the user queries. We implemented an application, Email Organizer, based on the SDMS. This system will be useful for individuals as well as organizations where their emails need to be stored, indexed, categorized and then retrieved efficiently based on the content. User queries can be formulated by using keywords from the text that are interrelated by the Boolean operators AND, and OR. A novel algorithm based on linear feedback shift registers (LFSR) has also been developed and implemented for generating word signatures. The objective of incorporating this approach is to reduce its false drop ratio and to improve its signature generation speed. This algorithm has been compared with the hashing functions for performance. Experiments revealed that the LFSR algorithm is comparable to the hashing algorithm in terms of signature generation time and false drop ratio.

Committee:

Dr. Chia-Yung Han (Advisor)

Subjects:

Computer Science

Keywords:

SIGNATURE FILES; TEXT RETRIEVAL METHODS; INDEXING; DOCUMENT MANAGEMENT SYSTEM; PSEUDO RANDOM NUMBER GENERATION

Gibas, Michael A.Improving Query Performance through Application-Driven Processing and Retrieval
Doctor of Philosophy, The Ohio State University, 2008, Computer Science and Engineering
The proliferation of massive data sets across many domains and the need to gain meaningful insights from these data sets highlight the need for advanced data retrieval techniques. Because I/O cost dominates the time required to answer a query, sequentially scanning the data and evaluating each data object against query criteria is not an effective option for large data sets. An effective solution should require reading as small a subset of the data as possible and should be able to address general query types. Access structures built over single attributes also may not be effective because 1) they may not yield performance that is comparable to that achievable by an access structure that prunes results over multiple attributes simultaneously and 2) they may not be appropriate for queries with results dependent on functions involving multiple attributes. Indexing a large number of dimensions is also not effective, because either too many subspaces must be explored or the index structure becomes too sparse at high dimensionalities. The key is to find solutions that allow for much of the search space to be pruned while avoiding this ‘curse of dimensionality’. This thesis pursues query performance enhancement using two primary means 1) processing the query effectively based on the characteristics of the query itself and 2) physically organizing access to data based on query patterns and data characteristics. Query performance enhancements are described in the context of several novel applications including 1) Optimization Queries, which presents an I/O-optimal technique to answer queries when the objective is to maximize or minimize some function over the data attributes, 2) High-Dimensional Index Selection, which offers a cost-based approach to recommend a set of low dimensional indexes to effectively address a set of queries, and 3) Multi-Attribute Bitmap Indexes, which describes extensions to a traditionally single-attribute query processing and access structure framework that enables improved query performance.

Committee:

Hakan Ferhatosmanoglu (Advisor); Atanas Rountev (Committee Member); Hui Fang (Committee Member)

Subjects:

Computer Science

Keywords:

Databases; Query Processing; Indexing; High-Dimensional

Albin, AaronBuilding an online UMLS knowledge discovery platform using graph indexing
Master of Science, The Ohio State University, 2014, Computer Science and Engineering
The UMLS is a rich collection of biomedical concepts which are connected by semantic relations. Using transitively associated information for knowledge discovery has been shown to be effective for many applications in the biomedical field. Although there are a few tools and methods available for extracting transitive knowledge from the UMLS, they usually have major restrictions on the length of transitive relations or on the number of data sources. To overcome these restrictions, the web platform onGrid was developed to support efficient path queries and knowledge discovery on the UMLS. This platform provides several features such as converting natural language queries into UMLS concepts, performing efficient queries, and visualizing the result paths. It also builds relationship and distance matrices for two sets of biomedical terms, making it possible to perform effective knowledge discovery on these concepts. onGrid can be applied to study biomedical concept relations between any two sets or within one set of biomedical concepts. In this work, onGrid is used to study the gene-gene relationships in HUGO as well as disease-disease relationships in OMIM. By cross validating the results with external datasets, it is demonstrated that onGrid is very efficient to be used for conceptual-based knowledge discovery on the UMLS. onGrid is a very efficient tool for querying the UMLS for transitive relations, studying relationships between biomedical terms, and generating hypotheses. The online UMLS knowledge discovery platform has been tested on the BMI Netlab server (URL: https://netlab.bmi.osumc.edu/ongrid).

Committee:

Yang Xiang (Advisor); Rajiv Ramnath (Committee Member)

Subjects:

Computer Science

Keywords:

UMLS;graph indexing;knowledge discovery

Ozturk, OzgurFeature extraction and similarity-based analysis for proteome and genome databases
Doctor of Philosophy, The Ohio State University, 2007, Computer and Information Science
Bioinformatics will boost our understanding of how life works and enhance medicinal and bio technology. Very large amounts of data is being produced by the experiments of the researchers trying to decipher the complexity of life. In this dissertation, I present our methods for search and analysis of microbiological sequence and 3D protein structure data. We developed methods to map genomic and proteomic sequencesinto metric feature vector spaces in order to facilitate the building of index structures that have practical, accurate, and sensitive filtering capabilities. Similarity distance functions between these N-gram frequency vectors and N-gram wavelet vectors are defined such that these distances satisfy desired properties to represent the original distance between the subsequences corresponding to the vectors. These vectors are indexed using a compressed, multiresolution, grid style data structure for efficient pruning of the candidates in the search space. Our method to index protein structuresdefines and utilizes spatial profiles, i.e., summaries constructed from the geometrical and biochemical properties that characterize the neighborhood around the geometrically significant sites of proteins. These features are then scored using a statistical measure, for their ability to distinguish a family of proteins from a background set of unrelated proteins, and successful features are combined into a representative set for the protein family. Unlike most of the currently available methods, our methods are able to capture structurally local motifs. The results verify that our method is successful both in identifying the distinctive sites of a given family of proteins, and in classifying proteins using the extracted features. These tools utilize accurate and compact representations of data together with better similarity measures, new data structures and algorithms, and apply data mining techniques in novel ways to help researchers extract information from very large data repositories and make better use of them.

Committee:

Hakan Ferhatosmanoglu (Advisor)

Keywords:

; Bioinformatics; Structural Motifs; Sequence Indexing; Sequence Similarity; Subsequence Similarity; Substructure Similarity; Very Large Databases; Similarity Search; k-NN Search; Range Search; Approximate Querying; Quantized Index; Multiresolution Search

Jin, WeiGRAPH PATTERN MATCHING, APPROXIMATE MATCHING AND DYNAMIC GRAPH INDEXING
Doctor of Philosophy, Case Western Reserve University, 2011, EECS - Computer and Information Sciences

In recent years, graph pattern matching, approximate graph matching and dynamic subgraph indexing have become important graph analysis tools due to the emergence of many new applications such as computational biology and social networks analysis. In this work we investigate these three related problems.

For the first problem, in previous existing models, each edge in the query pattern represents the same relationship, e.g., the two endpoint vertices have to be connected or the distance between them should be within a certain uniform threshold. However, real world applications may require edges representing different relationships or distances. Therefore, we introduce the flexible pattern matching model where a range [mine; maxe] is associated with an edge e in the query pattern, which means that the minimum distance between the matched endpoints of e is in the range of [mine; maxe]. A novel pattern matching algorithm is devised, which consists of several innovations including preprocessing the query pattern, two types of indices, and a top-k matches generation scheme.

For the second problem, we study the problem of finding approximate matches of a query graph in a large database graph with (possible) missing edges. The SAPPER method is proposed, utilizing the hybrid neighborhood unit structures in the index. SAPPER also takes advantage of pre-generated random spanning trees and a carefully designed graph enumeration order.

For the third problem, to the best of our knowledge, most subgraph indexing focuses on the static database graphs. However, in real applications, database graphs may change over time. Thus, we propose an indexing structure, BR-index, for large dynamic graphs. The large database graph is partitioned into a set of overlapping index regions. Features (small subgraphs) are extracted from these regions and used to index them. The updates to G can be localized to a small number of these regions. To further improve the efficiency in updates and query processing, several novel techniques and data structures are invented, which include feature lattice, maximal features, and overlapping regions.

Extensive empirical studies have been conducted to show the effectiveness and efficiency of our indices and methods.

Committee:

Jiong Yang (Advisor); Gultekin Ozsoyoglu (Committee Member); Wojbor Woyczynski (Committee Member); Soumya Ray (Committee Member)

Subjects:

Computer Science

Keywords:

Graph Indexing; Graph Matching; Approximate; Dynamic Graph

Steele, Aaron M.Efficient Private Data Outsourcing
Master of Computer Science, Miami University, 2011, Computer Science & Software Engineering
Data outsourcing provides companies a cost effective method for their data to be stored, managed, and maintained by a third-party. Data outsourcing offers many economical benefits, but also introduces several privacy concerns. Many solutions have been proposed for maintaining privacy while outsourcing data. We propose a method that can maintain a similar level of privacy while improving upon query performance of previous solutions. Our scheme is based upon the assumption that the data owner possesses a small amount of secure local storage, which can be used as a pseudo-index table. The indexing structure can significantly increase query performance for selection queries involving conjunctions. We also offer approaches for approximations the required storage and provide experimental analysis of both the indexing structure performance and storage approximations.

Committee:

Keith Frikken, PhD (Advisor); William Brinkman, PhD (Committee Member); Lukasz Opyrchal, PhD (Committee Member)

Subjects:

Computer Science

Keywords:

Data Outsourcing; Indexing; Privacy

Anichowski, BrianAn Experimental Investigation of the Effect of Spacing Errors on the Loaded Transmission Error of Spur Gear Pairs
Master of Science, The Ohio State University, 2017, Mechanical Engineering
This paper complements recent investigations [Handschuh et al. (2014), Talbot et al. (2016)] of the influences of tooth indexing errors on dynamic factors of spur gears by presenting data on changes to the dynamic transmission error. An experimental study is performed using an accelerometer-based dynamic transmission error measurement system incorporated into a high-speed gear tester to establish baseline dynamic behavior of gears having negligible indexing errors, and to characterize changes to this baseline due to application of tightly-controlled intentional indexing errors. Spur gears having different forms of indexing errors are paired with a gear having negligible indexing error. Dynamic transmission error of gear pairs under these error conditions is measured and examined in both time and frequency domains to quantify the transient effects induced by these indexing errors. These measurements are then compared against the baseline, no error condition, as a means to quantify the dynamic vibratory behavior induced due to the tooth indexing errors. These comparisons between measurements indicate clearly that the baseline dynamic response, dominated by well-defined resonance peaks and mesh harmonics, are complemented by non-mesh orders of transmission error due the transient behavior induced by indexing errors. In addition, the tooth (or teeth) having indexing error imparts transient effects which dominate the vibratory response of the system for significantly more mesh cycles than the teeth having errors are in contact. For this reason, along with the results presented in Talbot et al. (2016), it was concluded that spur gears containing indexing errors exhibit significant deviations from nominal behavior, at both a system and time-domain level.

Committee:

Ahmet Kahraman (Advisor); David Talbor (Committee Member)

Subjects:

Engineering; Mechanical Engineering

Keywords:

Gear, Gears, Spur Gears, Dynamics, Transmission Error, Indexing Error, Manufacturing Error, Vibrations, Noise, Accelerometer, GearLab

Young, Carol ElizabethDevelopment of language analysis procedures with application to automatic indexing /
Doctor of Philosophy, The Ohio State University, 1973, Graduate School

Committee:

Not Provided (Other)

Subjects:

Computer Science

Keywords:

Automatic indexing;Computer programming

Abu Doleh, AnasHigh Performance and Scalable Matching and Assembly of Biological Sequences
Doctor of Philosophy, The Ohio State University, 2016, Electrical and Computer Engineering
Next Generation Sequencing (NGS), the massive parallel and low-cost sequencing technology, is able to generate an enormous size of sequencing data. This facilitates the discovery of new genomic sequences and expands the biological and medical research. However, these big advancements in this technology also bring big computational challenges. In almost all NGS analysis pipelines, the most crucial and computationally intensive tasks are sequence similarity searching and de novo genome assembly. Thus, in this work, we introduced novel and efficient techniques to utilize the advancements in the High Performance Computing hardware and data computing platforms in order to accelerate these tasks while producing high quality results. For the sequence similarity search, we have studied utilizing the massively multithreaded architectures, such as Graphical Processing Unit (GPU), in accelerating and solving two important problems: reads mapping and maximal exact matching. Firstly, we introduced a new mapping tool, Masher, which processes long~(and short) reads efficiently and accurately. Masher employs a novel indexing technique that produces an index for huge genome, such as the human genome, with a small memory footprint such that it could be stored and efficiently accessed in a restricted-memory device such as a GPU. The results show that Masher is faster than state-of-the-art tools and obtains a good accuracy and sensitivity on sequencing data with various characteristics. Secondly, maximal exact matching problem has been studied because of its importance in detection and evaluating the similarity between sequences. We introduced a novel tool, GPUMEM, which efficiently utilizes GPU in building a lightweight indexing and finding maximal exact matches inside two genome sequences. The index construction is so fast that even by including its time, GPUMEM is faster in practice than state-of-the-art tools that use a pre-built index. De novo genome assembly is a crucial step in NGS analysis because of the novelty of discovered sequences. Firstly, we have studied parallelizing the de Bruijn graph based de novo genome assembly on distributed memory systems using Spark framework and GraphX API. We proposed a new tool, Spaler, which assembles short reads efficiently and accurately. Spaler starts with the de Bruijn graph construction. Then, it applies an iterative graph reduction and simplification techniques to generate contigs. After that, Spaler uses the reads mapping information to produce scaffolds. Spaler employs smart parallelism level tuning technique to improve the performance in each of these steps independently. The experiments show promising results in term of scalability, execution time and quality. Secondly, we addressed the problem of de novo metagenomics assembly. Spaler may not properly assemble the sequenced data extracted from environmental samples. This is because of the complexity and diversity of the living microbial communities. Thus, we introduced meta-Spaler, an extension of Spaler, to handle metagenomics dataset. meta-Spaler partitions the reads based on their expected coverage and applies an iterative assembly. The results show an improving in the assembly quality of meta-Spaler in comparison to the assembly of Spaler.

Committee:

Umit Catalyurek (Advisor); Kun Huang (Committee Member); Fusun Ozguner (Committee Member)

Subjects:

Bioinformatics; Computer Engineering

Keywords:

bioinformatics;sequence similarity;indexing;graphical processing unit;Apache Spark;de Bruijn graph;de novo assembly;metagenomics

Lay, William MichaelThe Double-KWIC coordinate indexing technique : theory, design, and implementation /
Doctor of Philosophy, The Ohio State University, 1973, Graduate School

Committee:

Not Provided (Other)

Subjects:

Computer Science

Keywords:

Automatic indexing