Search Results (1 - 4 of 4 Results)

Sort By  
Sort Dir
 
Results per page  

Joshi, Amit KrishnaExploiting Alignments in Linked Data for Compression and Query Answering
Doctor of Philosophy (PhD), Wright State University, 2017, Computer Science and Engineering PhD
Linked data has experienced accelerated growth in recent years due to its interlinking ability across disparate sources, made possible via machine-processable RDF data. Today, a large number of organizations, including governments and news providers, publish data in RDF format, inviting developers to build useful applications through reuse and integration of structured data. This has led to tremendous increase in the amount of RDF data on the web. Although the growth of RDF data can be viewed as a positive sign for semantic web initiatives, it causes performance bottlenecks for RDF data management systems that store and provide access to data. In addition, a growing number of ontologies and vocabularies make retrieving data a challenging task. The aim of this research is to show how alignments in the Linked Data can be exploited to compress and query the linked datasets. First, we introduce two compression techniques that compress RDF datasets through identification and removal of semantic and contextual redundancies in linked data. Logical Linked Data Compression is a lossless compression technique which compresses a dataset by generating a set of new logical rules from the dataset and removing triples that can be inferred from these rules. Contextual Linked Data Compression is a lossy compression technique which compresses datasets by performing schema alignment and instance matching followed by pruning of alignments based on confidence value and subsequent grouping of equivalent terms. Depending on the structure of the dataset, the first technique was able to prune more than 50% of the triples. Second, we propose an Alignment based Linked Open Data Querying System (ALOQUS) that allows users to write query statements using concepts and properties not present in linked datasets and show that querying does not require a thorough understanding of the individual datasets and interconnecting relationships. Finally, we present LinkGen, a multipurpose synthetic Linked Data generator that generates a large amount of repeatable and reproducible RDF data using statistical distribution, and interlinks with real world entities using alignments.

Committee:

Pascal Hitzler , Ph.D. (Advisor); Guozhu Dong, Ph.D. (Committee Member); Krishnaprasad Thirunaraya, Ph.D. (Committee Member); Michelle Cheatham, Ph.D. (Committee Member); Subhashini Ganapathy, Ph.D. (Committee Member)

Subjects:

Computer Science

Keywords:

Linked Data; RDF Compression; Ontology Alignment; Linked Data Querying; Synthetic RDF Generator; SPARQL

GUDIVADA, RANGA CHANDRADISCOVERY AND PRIORITIZATION OF BIOLOGICAL ENTITIES UNDERLYING COMPLEX DISORDERS BY PHENOME-GENOME NETWORK INTEGRATION
PhD, University of Cincinnati, 2007, Engineering : Biomedical Engineering
An important goal for biomedical research is to elucidate causal and modifier networks of human disease. While integrative functional genomics approaches have shown success in the identification of biological modules associated with normal and disease states, a critical bottleneck is representing knowledge capable of encompassing asserted or derivable causality mechanisms. Both single gene and more complex multifactorial diseases often exhibit several phenotypes and a variety of approaches suggest that phenotypic similarity between diseases can be a reflection of shared activities of common biological modules composed of interacting or functionally related genes. Thus, analyzing the overlaps and interrelationships of clinical manifestations of a series of related diseases may provide a window into the complex biological modules that lead to a disease phenotype. In order to evaluate our hypothesis, we are developing a systematic and formal approach to extract phenotypic information present in textual form within Online Mendelian Inheritance in Man (OMIM) and Syndrome DB databases to construct a disease - clinical phenotypic feature matrix to be used by various clustering procedures to find similarity between diseases. Our objective is to demonstrate relationships detectable across a range of disease concept types modeled in UMLS to analyze the detectable clinical overlaps of several Cardiovascular Syndromes (CVS) in OMIM in order to find the associations between phenotypic clusters and the functions of underlying genes and pathways. Most of the current biomedical knowledge is spread across different databases in different formats and mining these datasets leads to large and unmanageable results. Semantic Web principles and standards provide an ideal platform to integrate such heterogeneous information and could allow the detection of implicit relations and the formulation of interesting hypotheses. We implemented a page-ranking algorithm onto Semantic Web to prioritize biological entities for their relative contribution and relevance which can be combined with this clustering approach. In this way, disease-gene, disease-pathway or disease-process relationships could be prioritized by mining a phenome - genome framework that not only discovers but also determines the importance of the resources by making queries of higher order relationships of multi-dimensional data that reflect the feature complexity of diseases.

Committee:

Dr. Bruce Aronow (Advisor)

Keywords:

Semantic Web; RDF; OWL; SPARQL; Ontology; Biomedical Informatics; Bioinformatics; Integrative Bioinformatics; Text Mining; Phenome; Genome; Disease Modularity; Data Integration; Semantic Integration

Huster, ToddOWL query answering using machine learning
Master of Science (MS), Wright State University, 2015, Computer Science
The formal semantics of the Web Ontology Language (OWL) enables automated reasoning over OWL knowledge bases, which in turn can be used for a variety of purposes including knowledge base development, querying and management. Automated reasoning is usually done by means of deductive (proof-theoretic) algorithms which are either provably sound and complete or employ approximate methods to trade some correctness for improved efficiency. As has been argued elsewhere, however, reasoning methods for the Semantic Web do not necessarily have to be based on deductive methods, and approximate reasoning using statistical or machine-learning approaches may bring improved speed while maintaining high precision and recall, and which furthermore may be more robust towards errors in the knowledge base and logical inconsistencies. In this thesis, we show that it is possible to learn a linear-time classi fier that closely approximates deductive OWL reasoning in some settings. In particular, we specify a method for extracting feature vectors from OWL ontologies that enables the ID3 and AdaBoost classi fiers to approximate OWL query answering for single answer variable queries. Amongst other ontologies, we evaluate our approach using the LUBM benchmark and the DCC ontology (a large real-world dataset about traffic in Dublin) and show considerable improvement over previous efforts.

Committee:

Pascal Hitzler, Ph.D. (Advisor); Michelle Cheatham, Ph.D. (Committee Member); John Gallagher, Ph.D. (Committee Member)

Subjects:

Artificial Intelligence; Computer Science

Keywords:

approximate reasoning; OWL query answering; Semantic Web; SPARQL; ontology; description logic

Patni, Harshal KamleshReal Time Semantic Analysis of Streaming Sensor Data
Master of Science (MS), Wright State University, 2011, Computer Science
The emergence of dynamic information sources - like social, mobile and sensors, has led to ginormous streams of real time data on the web also called, the era of Big Data [1]. Research studies suggest, these dynamic networks have created more data in the last three years than in the entire history of civilization, and this trend will only increase in the coming years [1]. Gigaom article on Big data shows, how the total information generated by these dynamic information sources has completely surpassed the total storage capacity. Thus keeping in mind the problem of ever-increasing data, this thesis focuses on semantically integrating and analyzing multiple, multimodal, heterogeneous streams of weather data with the goal of creating meaningful thematic abstractions in real-time. This is accomplished by implementing an infrastructure for creating and mining thematic abstractions over massive amount of real-time sensor streams. Evaluation section shows 69% data reduction with this approach.

Committee:

Amit Sheth, PhD (Advisor); Ramakanth Kavaluru, PhD (Committee Member); Krishnaprasad Thirunarayan, PhD (Committee Member)

Subjects:

Computer Science; Geographic Information Science

Keywords:

Semantic Web;Semantic Sensor Web;Real-Time Sensor Web;RDF;SPARQL;SSN Ontology;Open Geospatial Consortium;Sensor Web Enablement;Observation and Meausrements