Doctor of Philosophy, Case Western Reserve University, 2016, EECS - Computer and Information Sciences
Providing an efficient and expressive querying technique for graph structured RDF data is an emergent problem as large amounts of RDF data are available from applications in many areas. Current techniques do not fully satisfy this goal due to the nature of the RDF model which requires highly flexible use of keywords, and a structure expression in query language. Viewing RDF as graphs requires additional graph-based functionalities, such as querying a path or a tree connection. We propose a querying framework, called RDF-h, which uses the query template as a basic query unit, and supports both partially entered keywords and query conditions based on graph-structure. In order to provide efficient query evaluation, signature-based index is utilized.
Though most existing techniques which utilize signature-based index claim its benefits on all datasets and queries. The effectiveness of signature-based pruning varies greatly among different RDF datasets and highly related with their dataset characteristics. The performance benefits from signature-based pruning depend not only on the size of the RDF graphs, but also the underlying graph structure and the complexity of queries. We propose several dataset evaluation metrics, namely, coverage and coherence, relationship specialty and literal diversity to understand the query performance differences among real and synthetic RDF datasets. Based on these results, we further propose an application-specific framework, called RBench, to generate RDF benchmarks.
By evaluating the characteristics of RDF datasets and the complexity of query templates, RDF-h selectively utilizes signature-based pruning when it is considered to be beneficial. Two aspects of RDF-h framework are evaluated in experiments: 1. extensive query performance evaluation based on randomly generated queries for different datasets; 2. utilization of RDF-h for biomedical applications. For random query evaluation, the RDF-h algorithm can automatically capture freque (open full item for complete abstract)
Committee: Meral Özsoyoglu (Advisor); Gultekin Özsoyoglu (Committee Member); Mehmet Koyutürk (Committee Member); Marc Buchner (Committee Member); Soumya Ray (Committee Member); Andy Podgurski (Committee Member); Xiang Zhang (Committee Member)
Subjects: Computer Science