Doctor of Philosophy, Case Western Reserve University, 2021, EECS - Computer and Information Sciences
Due to advances in experimental techniques, modern biological research provides an extensive and diverse set of data for computational analyses.
At the genomic level, high throughput sequencing is capable of producing massive amounts of patient specific genetic information.
Moving toward the fields of proteomics and cellular signaling, biological association data such as protein interactions are frequently studied though a graph theoretical model.
These protein-protein interaction networks (PPIs) can then be extended by adding additional forms of network data such expression quantitative trait loci (eQTL), and disease associations, resulting in expansive heterogeneous networks.
Furthermore, these networks are often tissue specific, all together representing a massive number of semantically useful network variations.
This motivates the development of efficient compressed data structures and algorithms for working with versioned biological network data.
In this dissertation I present algorithms and data-structures for efficiently compressing and querying biological data in real time.
LinDen is a method for detecting epistatically interacting loci in genome wide association (GWAS) data. By hierarchically compressing loci according to their linkage disequilibrium between one another, it is possible to perform a highly accurate heuristic search for epistatically interacting locus pairs.
VerTIoN is a compressed versioned sparse graph data-structure applied to the storage, retrieval, and integration of heterogeneous tissue specific networks including: protein interactions, eQTL interactions, and disease associations. I show that this method substantially improves the storage efficiency of tissue specific network data, while allowing fast decompression and composition.
Finally, the work with VerTIoN is extended by utilizing it as the back-end of a multi-user versioned network query engine, enabling arbitrary on the fly version composition.
To demonstrate the (open full item for complete abstract)
Committee: Mehmet Koyutürk (Committee Chair); Jing Li (Committee Member); Yinghui Wu (Committee Member); Rong Xu (Committee Member)
Subjects: Bioinformatics; Computer Science