Doctor of Philosophy, Case Western Reserve University, 2010, EECS - Computer and Information Sciences
Until fairly recently, single nucleotide polymorphisms (SNPs) were thought to be the main source of variation in the human genome. With the advent of high-throughput genome scanning technologies, it has been revealed that there are other forms of genomic variation beyond single base-pair substitutions. These structural alterations include insertions, deletions, inversions, translocations, tandem repeats of DNA sequences and copy number variants (CNVs). Concisely, all of these alterations are referred as structural variations.
CNVs represent the segments of the genome that are polymorphic with regard to genomic copy number. Copy number polymorphisms (CNPs), which can be considered as a specific category CNVs, are defined to be copy number variants that are present, with identical boundaries (and are therefore likely identical-by-descent), in at least 1% of the human population. Tandem repeats, on the other hand, are described as serially repeated segments of the human genome which may have repeat units several hundred kilobases in size.
CNVs, which have been shown to have a role in various diseases such as Alzheimer disease, Crohn's disease, autism and schizophrenia, can be caused by various structural mutations such as duplications and deletions. In the effort to scan the entire genome of human populations, as well as individuals, for CNVs (also CNPs) and tandem repeats, SNP arrays and paired end sequence mapping data have emerged as important tools.
In this thesis, we study the problem of identifying CNVs, CNPs and tandem repeats from these data sources. We first frame CNV identification as an optimization problem with an objective function that is explicitly designed so that its optimal solution is the most accurate set of CNV calls. Our method, termed COKGEN, finds the best solution using a variant of the well-known heuristic simulated annealing. Next, we present a method for identifying and genotyping common CNPs. The proposed method, POLYGON, draws strength fr (open full item for complete abstract)
Committee: Mehmet Koyuturk (Committee Chair); Thomas LaFramboise (Committee Member); Meral Ozsoyoglu (Committee Member); Jing Li (Committee Member)
Subjects: Bioinformatics; Computer Science