Doctor of Philosophy, Case Western Reserve University, 2009, EECS - Computer and Information Sciences
With the advent of high-throughput experimental and genome sequencing technologies, the amounts of produced biological data and the related literature have increased dramatically. A significant portion of the produced biological data has revealed genotypic features of many model organisms. An outstanding problem presently is to map the characterized genotypic features of organisms to their phenotypic properties with the ultimate goal of making high-impact scientific discoveries in areas including diagnosing/curing diseases, engineering genomes, and inventing drugs. To this end, three major challenges concerning the management and analysis of the available data are: (i) high volume (e.g., thousands of genes, millions of publications), (ii) increasing diversity (e.g., genes, pathways, metabolic profiles), and (iii) high complexity (e.g., hierarchical organization of entities, graph structures, text/image data). Hence, efficient and effective biological data analysis and mining tools that can keep up with the increasing biological data production rate are highly desirable.
In this thesis, we study four biological data mining and analysis problems towards having a better understanding of the underlying biological phenomena. Our contributions address distinct keystones on the path from genotype (e.g., genes and their functionality annotations) to phenotype (e.g., metabolite concentration level changes, physiological conditions). More specifically, at the textual-knowledge level, we investigate automated functionality annotations of individual genomic entities from biomedical articles through text mining. Next, at the annotation (ontology) level, we study how functional annotations of individual genomic entities form templates in the context of their pathways with applications on pathway mining and categorization. Then, we generalize the problem of discovering frequent pathway functionality templates into a purely computer science problem, namely, that of mining taxonomy- (open full item for complete abstract)
Committee: Gultekin Ozsoyoglu PhD (Advisor); Mark Adams PhD (Committee Member); Mehmet Koyuturk PhD (Committee Member); Jing Li PhD (Committee Member); Meral Ozsoyoglu PhD (Committee Member)
Subjects: Bioinformatics; Computer Science