Linked data has experienced accelerated growth in recent years due to its interlinking ability across disparate sources, made possible via machine-processable RDF data. Today, a large number of organizations, including governments and news providers, publish data in RDF format, inviting developers to build useful applications through reuse and integration of structured data. This has led to tremendous increase in the amount of RDF data on the web. Although the growth of RDF data can be viewed as a positive sign for semantic web initiatives, it causes performance bottlenecks for RDF data management systems that store and provide access to data. In addition, a growing number of ontologies and vocabularies make retrieving data a challenging task.
The aim of this research is to show how alignments in the Linked Data can be exploited to compress and query the linked datasets. First, we introduce two compression techniques that compress RDF datasets through identification and removal of semantic and contextual redundancies in linked data. Logical Linked Data Compression is a lossless compression technique which compresses a dataset by generating a set of new logical rules from the dataset and removing triples that can be inferred from these rules. Contextual Linked Data Compression is a lossy compression technique which compresses datasets by performing schema alignment and instance matching followed by pruning of alignments based on confidence value and subsequent grouping of equivalent terms. Depending on the structure of the dataset, the first technique was able to prune more than 50% of the triples. Second, we propose an Alignment based Linked Open Data Querying System (ALOQUS) that allows users to write query statements using concepts and properties not present in linked datasets and show that querying does not require a thorough understanding of the individual datasets and interconnecting relationships. Finally, we present LinkGen, a multipurpose synthetic Linked Data generator that generates a large amount of repeatable and reproducible RDF data using statistical distribution, and interlinks with real world entities using alignments.