Search Results (1 - 25 of 280 Results)

Sort By  
Sort Dir
 
Results per page  

Liu, YatingMotif Selection via a Tabu Search Solution to the Set Cover Problem
Master of Science (MS), Ohio University, 2017, Computer Science (Engineering and Technology)
Transcription factors (TFs) regulate gene expression through interaction with specific DNA regions, called transcription factor binding sites (TFBSs). Identifying TFBSs can help in understanding the mechanisms of gene regulation and the biology of human diseases. Motif discovery is the traditional method for discovering TFBSs. However, current motif discovery tools tend to generate a number of motifs that is too large to permit a biological validation. To address this problem, the motif selection problem is introduced. The aim of the motif selection problem is to select a small set of motifs from the discovered motifs, which cover a high percentage of genomic input sequences. Tabu search, a metaheuristic search method based on local search, is introduced to solve the motif selection problem. The performance of the proposed three motif selection methods, tabu-SCP, tabu-PSC and tabu-PNPSC, were evaluated by applying them to ChIP-seq data from the ENCyclopedia of DNA Elements (ENCODE) project. Motif selection was performed on 46 factor groups which include 158 human ChIP-seq data sets. The results of the three motif selection methods were compared with Greedy, enrichment method and relax integer liner programming (RILP). Tabu-PNPSC selected the smallest set of motifs with the highest overall accuracy. The average number of selected motifs was 1.37 and the average accuracy was 72.47%. Tabu-PNPSC was used to identify putative regulatory element binding sites that are in response to the overproduction of small RNAs RyfA1 in the bacteria Shigella dysenteriae. Six motifs were selected by tabu-PNPSC and the overall accuracy was 75.5%.

Committee:

Lonnie Welch (Advisor)

Subjects:

Bioinformatics; Computer Science

Keywords:

motif selection; tabu search; set cover problem

Zhang, YingxiaoGenetic Engineering of Rubber Producing Dandelions
Doctor of Philosophy, The Ohio State University, 2016, Horticulture and Crop Science
Natural rubber (cis-1, 4-polyisoprene) is a biopolymer of significance used in both manufacturing and our daily lives. Unfortunately, the current rubber production system, based on the Para rubber tree (Hevea brasiliensis), is unsustainable due to increasing costs of manual latex collection, competition with other cash crops, and the pervasive threat of South American Leaf Blight, a fatal fungal pathogen. It is imperative to develop alternative rubber-producing crops. Rubber dandelion (Taraxacum kok-saghyz, TK) and Taraxacum brevicorniculatum (TB) are dandelion species which produce rubber in roots and have several desirable agronomic characteristics. TK is currently under development as an alternative rubber producing crop while TB is a model species for rubber biosynthesis. TK domestication will inevitably involve the introduction of novel traits through breeding or genetic modifications. To develop tools to monitor the potential gene flow between TK and its ubiquitous weedy relative, common dandelion (Taraxacum officinale, TO), chloroplast genomes have been sequenced for TK, TB and TO and chloroplast and nuclear species-specific markers have been developed and validated. The genomic and marker resources generated here provide a molecular tool kit for germplasm identification and gene flow studies. To advance crop improvement efforts by biotechnology, a rapid and hormone-free Agrobacterium rhizogenes-mediated transformation system was developed for TK and TB. By using root fragments as explants, non-composite transgenic plants were obtained within 8 weeks and the average transformation efficiency for TK and TB was 24.7% and 15.7%, respectively. Protocols developed here were used to transform TK and TB with rubber biosynthesis genes. The rate-limiting enzyme in the mevalonate pathway (MVA pathway), 3-hydroxy-3-methyl-glutaryl-coenzyme A reductase (HMGR), was introduced into TK and TB. Six genes encoding the entire MVA pathway were introduced into TK and the corresponding enzymes were located to chloroplast. Transgenic plants generated here will be used for metabolic analysis to understand genetic regulation of rubber biosynthesis. Due to the rapid development of novel biotechnologies, precise gene manipulation methods were also developed. A fast pipeline was developed to apply genome editing to TK using CRISPR/Cas9 (clustered regularly interspaced short palindromic repeats/CRISPR associated protein 9). In parallel, preliminary attempts were made to manipulate the plastid genome using plastid engineering. Overall, this research will facilitate biogenesis studies, as well as domestication and commercialization of rubber producing dandelions.

Committee:

Katrina Cornish, Dr. (Advisor); Joshua Blakeslee, Dr. (Advisor); John Cardina, Dr. (Committee Member); Feng Qu, Dr. (Committee Member)

Subjects:

Bioinformatics; Biology; Cellular Biology; Ecology; Horticulture; Molecular Biology; Plant Biology; Plant Sciences

Keywords:

natural rubber; rubber-producing dandelions; Taraxacum kok-saghyz; Taraxacum brevicorniculatum; Taraxacum officinale; chloroplast genome; species-specific molecular markers; genetic engineering; CRISPR genome editing; plastid engineering

Abu Doleh, AnasHigh Performance and Scalable Matching and Assembly of Biological Sequences
Doctor of Philosophy, The Ohio State University, 2016, Electrical and Computer Engineering
Next Generation Sequencing (NGS), the massive parallel and low-cost sequencing technology, is able to generate an enormous size of sequencing data. This facilitates the discovery of new genomic sequences and expands the biological and medical research. However, these big advancements in this technology also bring big computational challenges. In almost all NGS analysis pipelines, the most crucial and computationally intensive tasks are sequence similarity searching and de novo genome assembly. Thus, in this work, we introduced novel and efficient techniques to utilize the advancements in the High Performance Computing hardware and data computing platforms in order to accelerate these tasks while producing high quality results. For the sequence similarity search, we have studied utilizing the massively multithreaded architectures, such as Graphical Processing Unit (GPU), in accelerating and solving two important problems: reads mapping and maximal exact matching. Firstly, we introduced a new mapping tool, Masher, which processes long~(and short) reads efficiently and accurately. Masher employs a novel indexing technique that produces an index for huge genome, such as the human genome, with a small memory footprint such that it could be stored and efficiently accessed in a restricted-memory device such as a GPU. The results show that Masher is faster than state-of-the-art tools and obtains a good accuracy and sensitivity on sequencing data with various characteristics. Secondly, maximal exact matching problem has been studied because of its importance in detection and evaluating the similarity between sequences. We introduced a novel tool, GPUMEM, which efficiently utilizes GPU in building a lightweight indexing and finding maximal exact matches inside two genome sequences. The index construction is so fast that even by including its time, GPUMEM is faster in practice than state-of-the-art tools that use a pre-built index. De novo genome assembly is a crucial step in NGS analysis because of the novelty of discovered sequences. Firstly, we have studied parallelizing the de Bruijn graph based de novo genome assembly on distributed memory systems using Spark framework and GraphX API. We proposed a new tool, Spaler, which assembles short reads efficiently and accurately. Spaler starts with the de Bruijn graph construction. Then, it applies an iterative graph reduction and simplification techniques to generate contigs. After that, Spaler uses the reads mapping information to produce scaffolds. Spaler employs smart parallelism level tuning technique to improve the performance in each of these steps independently. The experiments show promising results in term of scalability, execution time and quality. Secondly, we addressed the problem of de novo metagenomics assembly. Spaler may not properly assemble the sequenced data extracted from environmental samples. This is because of the complexity and diversity of the living microbial communities. Thus, we introduced meta-Spaler, an extension of Spaler, to handle metagenomics dataset. meta-Spaler partitions the reads based on their expected coverage and applies an iterative assembly. The results show an improving in the assembly quality of meta-Spaler in comparison to the assembly of Spaler.

Committee:

Umit Catalyurek (Advisor); Kun Huang (Committee Member); Fusun Ozguner (Committee Member)

Subjects:

Bioinformatics; Computer Engineering

Keywords:

bioinformatics;sequence similarity;indexing;graphical processing unit;Apache Spark;de Bruijn graph;de novo assembly;metagenomics

Sweeney, Blake AlexanderBuilding Representative Sets Of RNA 3D Structures and Selecting High Quality Loops
Doctor of Philosophy (Ph.D.), Bowling Green State University, 2016, Biological Sciences
This dissertation contains two types of work. The first is the creation and maintenance of our data pipeline. This chapter focuses on the technical work behind the extension of our pipeline. In general, this work extends our previous pipeline to import more data as well as standardizing several parts of the pipeline. As a result, this work provides a framework for future modifications of the pipeline. This work was driven both by the move from RNA 3D structures being provided in mmCIF format instead of the more limited PDB format, as well as the need to clean up the previous version of the pipeline. The second type of work is scientific including my work on creating equivalence classes for all RNA 3D structures, using these sets to build representative sets and then how to use these representative sets along with new quality data to select a set of high quality loops for future analysis. The new work on equivalence classes and representative sets was driven by the move from PDB to mmCIF formats. This move forced the redesign of the previous method, as it would only use the largest chain in each PDB file. This change allowed me to reconsider the approach and allowed several improvements. The work on loop quality was prompted by the release of new structure quality data, Real Space R Z-Score (RSRZ). This data allows the examination of how well a proposed structure fits the data it is built from. By using this we can limit our studies of RNA loops to only those that are from high quality, well modeled structures.

Committee:

Neocles Leontis (Advisor); Raymond Larsen (Committee Member); George Bullerjahn (Committee Member); Hans Wildschutte (Committee Member); Howard Cromwell (Other)

Subjects:

Bioinformatics; Biology

Keywords:

RNA; 3D Structure; RSR; Motifs; X-ray Quality; RSRZ;

Unoarumhi, Yvette OchuwaEvolution of a Bacterial Global Regulator- Lrp
Master of Science in Biomedical Sciences (MSBS), University of Toledo, 2016, Biomedical Sciences (Bioinformatics and Proteomics/Genomics)
Global regulators each control hundreds of genes in bacteria, and it is still unclear how these regulators evolve, especially considering that gene regulation changes more rapidly than the regulated genes themselves. Leucine-responsive regulatory protein (Lrp) is a global regulator in enteric bacteria, controlling both metabolic and virulence-associated genes. Lrp orthologs are found among both Bacteria and Archaea. Surprisingly, even within the phylum ¿-Proteobacteria, Lrp is a global regulator in some orders and a local regulator in others. This raises important questions about the evolution of Lrp functions. The way global regulators function is crucially important to bacterial physiology. This thesis presents studies on the evolution and regulation pattern of Lrp, carried out with the goal of providing insights into global regulators more generally. Two independent studies of Lrp were carried out. The first compared Lrp sequences from four bacterial orders within the ¿-Proteobacteria: Enterobacteriales, Vibrionales, Pasteurelalles, and Alteromonadales. AsnC was also analyzed in parallel for comparison, as it is a paralog of Lrp that in all known cases is a local regulator controlling a small number of genes. As expected, Lrp and AsnC sequences formed two distinct clusters diverging from a common ancestor. These each divided into subclusters representing the Enterobacteriales, Vibrionales, and Pasteurellales. However, the Alteromonadales did not yield unitary clusters for either Lrp or AsnC, in contrast to the expected order-specific clustering we observed with the control housekeeping genes for 16S rRNA and RNA polymerase subunit RpoB. Logo analysis was also used to compare Lrp and AsnC in these four orders, and clear sequence signatures were identified. Ultimately, the Logo analysis provided the testable hypotheses that the globally-acting Lrp orthologs have short conserved sequences (particularly at the two ends of the polypeptides), and that Alteromonadales is unique among the orders tested in having member species with global or local Lrp orthologs. The second study focused on the manner in which Lrp protein regulates expression of its own gene (lrp). In E. coli Lrp represses lrp. However, it has been reported that Lrp activates the lrp gene in Vibrio cholerae. This can have major consequences, since Lrp controls so many genes. To address this question we measured lrp expression in a different V. cholerae species, in the presence and absence of Lrp using a Plrp-lacZ transcriptional fusion. While the V. cholerae strain background and growth medium differ from the original study, the results indicate that Lrp represses lrp in V. cholerae, as in E. coli. Our studies of Lrp provide better understanding of global regulators, including testable hypothesis for future studies.

Committee:

Matson Jyl (Committee Chair); Blumenthal Robert (Committee Member); Federov Alexei (Committee Member)

Subjects:

Bioinformatics

Keywords:

Transcription factors, Phylogenomics, Enterobacteriales, Vibrionales, Pasteurellales, Alteromonadales, autoregulation, global regulator, bacteria

Manivannan, Sathiya NarayananTRANSCRIPTIONAL CONTROL OF AN ESSENTIAL RIBOZYME AND AN EGFR LIGAND REVEAL SIGNIFICANT EVENTS IN INSECT EVOLUTION
Doctor of Philosophy, The Ohio State University, 2015, Molecular, Cellular and Developmental Biology
In this thesis I examined the regulation of two Drosophila genes: RNase P RNA (RPR), which codes for the ribozyme component of an essential pre-tRNA processing enzyme, and vein (vn), which encodes a secreted ligand for the Epidermal growth factor receptor (Egfr). These two genes represent two different modes of gene regulation—while RPR is ubiquitously expressed, vn has a complex pattern of expression in specific tissues. Further, RPR is present in the intron of the Drosophila ATPsynC gene and transcriptionally co-regulated with the recipient gene. In contrast, vn is an independent gene with a complex promoter. The transcriptional regulation of Drosophila RPR is intriguing because it lacks signals for Pol III transcription and is in the intron of a Pol II transcribed gene. This is in contrast to other eukaryotic RPR genes studied thus far, which are all typical Pol III regulated genes. Using biochemical analyses, I have demonstrated that the annotated gene, the only copy of RPR in the genome, codes for the bona fide Drosophila RPR. My reporter gene study demonstrated that RPR is produced in a splicing independent fashion and its biogenesis is dependent on the Pol II promoter of the recipient gene. Pol II dependent transcription of RPR seems to be a hallmark of two major groups in Arthropods - Hexapods and Vericrustaceans. Current data supports a genetic event that caused the switch from an independent Pol III transcribed RPR to a Pol II dependent RPR that occurred approximately 500 million years ago. After the initial change the Pol II transcribed RPR moved again, as evident in the different in RPR recipient genes in different orders of Hexapoda. The orthologs of RPR recipient genes in D. melanogaster are expressed throughout development in all tissues, suggesting that ubiquitous expression may be one of the characteristics of RPR recipient genes. While the transcriptional coupling leads to the ubiquitous expression of RPR, vn is expressed in a dynamic pattern in the wing imaginal disc. The development of the two cell-layered wing imaginal disc, which gives rise to the adult wing and the body wall, is dependent on Egfr signaling mediated by Vn. I found that Dpp-mediated TGF-Beta signaling from the peripodial cell layer induces the de novo expression of vn in the adjacent disc proper cell layer. This positive paracrine signaling by Dpp is transient and unidirectional. Subsequent to direct paracrine induction by Dpp, the vn expression domain expands in the dorsal part of the disc via a positive feedback loop involving the ETS transcription factor Pointed P2 (PntP2). The expression of vn in the ventral part of the disc is limited by the inhibitory effect of Wg and the restriction of pntP2 expression. A gene regulatory network (GRN), involving Vn, Dpp and Wg signaling pathways, is crucial for the patterning of the early wing disc. The GRN sub-circuit, which regulates early wing disc development, is essential for Drosophila wing and body wall development and it will be important to determine if this GRN sub-circuit is conserved in other insects.

Committee:

Amanda Simcox (Advisor)

Subjects:

Biochemistry; Bioinformatics; Biology; Genetics; Molecular Biology

Keywords:

Drosophila, RNase P, RNase MRP, Ribonucleoprotein complex, Intron, Development, Wing disc, Dpp, Vn, Wg,Gene regulatory network, Pnt P2, ETS

Sundaramurthy, GopinathA Probabilistic Approach for Automated Discovery of Biomarkers using Expression Data from Microarray or RNA-Seq Datasets
PhD, University of Cincinnati, 2016, Medicine: Systems Biology and Physiology
The response to perturbations in cellular systems is governed by a large number of molecular circuits that coalesce into a complex network. In complex diseases, the breakdown of cellular components is brought about by multiple molecular and environmental perturbations. While individual signatures of cellular components might vary significantly among clinical patients, commonality in signs and symptoms of disease progression is a compelling indicator that key cellular sub-processes follow similar trajectories? -. Our approach aims for an enhanced understanding of the effect of disease perturbations on the cell by developing an automated platform that assigns more significance to changes that occur at the sub-network level – focusing on genes that are “wired” together and change together. The platform that we have developed is motivated by the study of concomitant expression changes in sub-networks. The analysis by our platform produces a small subset of signaling and regulatory genes that are wired together and change together beyond random chance. In order to evaluate the effectiveness of our platform in producing subsets that can distinguish diseases and disease-subtypes, we used publicly available RNA-Seq and microarray breast cancer expression datasets. Each dataset was analyzed independently using our platform and the disease related sub-network perturbations among breast cancer subtypes were identified. The resulting subset was subjected to standard multi-way classification and predictions based on our approach were compared with PAM50 predictions. Biomarkers identified from the microarray and RNA-Seq dataset reproduced the PAM50 classification with 100% and 80% agreement respectively despite having only 10% of genes common with the PAM50. This proof-of-concept analysis using breast cancer datasets is indicative of the platform’s stable cross-validation results. This platform can potentially be used for automated and unbiased computational discovery of disease related genes. Our results suggest that probabilistic and automated approaches may offer a powerful complement to existing approaches by providing an unbiased initial screen.

Committee:

Steven Kleene, Ph.D. (Committee Chair); Judith| Heiny, Ph.D. (Committee Member); Anil Jegga, D.V.M. (Committee Member); Jaroslaw Melle, Ph.D. (Committee Member); Yana Zavros, Ph.D. (Committee Member)

Subjects:

Bioinformatics

Keywords:

Biomarker discovery;Probabilistic modeling;Network analysis;Expression Analysis;Complex disease;Genomics

Hariharan, JananiPredictive Functional Profiling of Soil Microbes under Different Tillages and Crop Rotations in Ohio
Master of Science, The Ohio State University, 2015, Environmental Science
Food production and security is dependent on maintaining soil health and quality. Thus, the emphasis on sustainable and healthy soil function is a top priority for scientists and land managers. One of the most important factors that influences soil function is the microbial community. Recent advances have allowed us to quantify more accurately the composition of such communities, but there is still a knowledge gap with regard to the contribution of microorganisms to various processes occurring in the soil. Understanding this will facilitate the development of healthier agroecosystems. In this thesis, a predictive functional approach is used to elucidate bacterial species–function relationships. Bacterial community profiles were compared across two tillage systems and two crop rotations in Northern Ohio (Wooster and Hoytville). 16S rRNA gene-targeted sequencing was performed and the raw data obtained were filtered, denoised and processed using QIIME. Open-reference OTU picking and taxonomic assignment was performed using the Greengenes database. I then used a computational approach called PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) to predict metagenomes and the most likely functions performed by individual species of bacteria. Sequence analysis reveals a large number of unidentified OTUs, which is consistent with our expectations of the soil ecosystem. Comparison of sequencing data from different platforms indicates that the dataset generated using Illumina sequencing provided better hits with the reference database than pyrosequencing, and was associated with a greater number of putative soil bacterial functions. PICRUSt allows an estimation of the level of involvement each OTU has with a specific gene function, which enables comparisons to be made across bacterial species and treatment conditions. Predicted functions of the bacterial community revealed a large number of proteins connected with metabolism and maintenance of natural organic molecules in soil as well as enzymes related to degradation of xenobiotics. Using this approach, I was also able to map specific OTUs to their functional potential. Bacterial enzymes implicated in the cycling of nitrogen, sulfur, carbon and methane through the soil were examined, as were enzymes that catalyzed the oxidative degradation of hydrocarbon compounds that are considered soil pollutants. Specialized groups of bacteria were linked to functions like nitrogen fixation and degradation of compounds like atrazine and chlorohydrocarbons. A broader range of OTUs was found to contain genes for carbon utilization and sulfur metabolism. These predictions are supported by previous ecological studies. There were other OTU-function relationships predicted in these studies that are novel and could be valuable in identifying commercially important microorganisms. These leads will require experimental validation. A clear difference was seen between the no-till and plow-till treatments, with no-till being functionally enriched for most major nutrient cycles. No such differences were observed between the different crop rotations. Proteobacteria, Actinobacteria and Acidobacteria were some of the most abundant phyla found in these soil samples, along with Nitrospirae, and Bacteroidetes. I concluded that long-term and continuous application of different tillage systems, and to a lesser extent crop rotation, result in unique bacterial communities that affect the overall functioning of the soil.

Committee:

Warren Dick (Advisor); Parwinder Grewal (Advisor); Margaret Staton (Committee Member)

Subjects:

Agriculture; Biogeochemistry; Bioinformatics; Ecology; Environmental Science; Microbiology; Soil Sciences

Keywords:

PICRUSt; soil metagenomics; soil bacteria; soil function; nutrient cycling

Marwaha, ShrutiA Genomics and Mathematical Modeling Approach for the Study of Helicobacter Pylori associated Gastritis and Gastric Cancer
PhD, University of Cincinnati, 2015, Medicine: Systems Biology and Physiology
Gastric cancer is the fifth most common malignancy in the world and third the leading cause of cancer-related mortality worldwide, with five-year survival rate of only 20-29%. In order to develop better drugs, diagnostics and preventive measures for gastric cancer, it is critical to understand the underlying molecular biology of the disease and factors that increase the risk for the disease. Helicobacter pylori-induced chronic gastritis is a major risk factor associated with gastric cancer development. We analyzed publically available gene expression data from patients with gastric cancer and patients with H. pylori mediated gastritis, to identify genes and pathways that play an important role in the two diseases. We further integrated the identified disease signature with Broad Institute’s Connectivity Map to identify and prioritize drugs that can potentially reverse the molecular signature of gastric cancer cells and that of gastric tumors resistant to Cisplatin-Flurouracil (CF) chemotherapy. Our analysis identified vorinostat, trichostatin A and thiostrepton as potential therapeutic compounds for gastric cancer treatment. We identified genes and pathways that are differentially expressed (57 up-regulated and 86 down-regulated) in both gastric cancer and H. pylori mediated atrophic gastritis. The topmost pathways enriched for these genes include - cell-cell adhesion/communication, tight junctions, leukocyte transendothelial migration, gastric acid secretion, potassium ion transport and creatine pathways. Analysis of CF resistant and sensitive tumors suggests the role of metabolic and statin pathways towards resistance to the chemotherapy. We also developed a mathematical model of a sub-network comprising of sonic hedgehog (SHH), pro-inflammatory cytokines and anti-inflammatory cytokines, which play a critical role in H. pylori mediated gastritis. We integrated qPCR results, mathematical modeling technique and microarray data from H. pylori infected mice to explore the temporal behavior of the cytokine-SHH sub-network. Our mathematical model suggests that NF?B, SHH and the cytokines engage in a feedback loop which can result in damped oscillations. The model helps to bring out emergent properties of the network and generate testable hypotheses. Future experiments capturing cytokines and SHH profile over time can reveal more insights about the relationship between the different genes, their regulation and improve our current understanding of the dynamics and sequence of the events in the system.

Committee:

Nelson Horseman, Ph.D. (Committee Chair); Mario Medvedovic, Ph.D. (Committee Member); Marshall Montrose, Ph.D. (Committee Member); Yana Zavros, Ph.D. (Committee Member); Hamid Eghbalnia, Ph.D. (Committee Member)

Subjects:

Bioinformatics

Vyas, AditiIdentification of Novel Stat92E Target Genes in Drosophila Hematopoiesis
Doctor of Philosophy (PhD), Ohio University, 2016, Molecular and Cellular Biology (Arts and Sciences)
The Jak/Stat signaling pathway is one of the most conserved signaling pathways regulating cellular processes such as cell proliferation and cellular differentiation. Mutations in Jak that make it constitutively active are implicated in the development of leukemia and myeloproliferative disorders in humans. A dominant mutation in the Drosophila Janus Kinase (or hopscotch) gene called hopTum-l causes an increase in Jak/Stat pathway activity levels and significantly increases the hemocyte count. Removal of one copy of the Phosphatase 61F gene, a negative regulator of the Jak/Stat pathway, in the hopTum-l background further increases pathway activity at the molecular level, but surprisingly causes a reduction in the hemocyte count. Our results suggest that the Drosophila Signal Transducer and Activator of Transcription (Stat92E), can regulate the expression of two different sets of target genes. Low threshold genes (LTG) are expressed at moderate levels of Jak/Stat pathway activity and high threshold genes (HTG) are activated at much higher levels of Jak/Stat pathway activity. Based on our hypothesis we propose a model that predicts certain transcriptional repressors negatively regulate the expression of the HTGs at moderate levels of Jak/Stat pathway activity. Loss of function (LOF) screening helped us identify C-terminal binding protein (CtBP) and Suppressor of Hairless [Su(H)] as potential transcriptional repressors of HTGs. Two independent in silico approaches were used to discover possible regions in the Drosophila melanogaster genome that have Stat92E and repressor binding sites within 1000 bp of each other. These scans identified thirty-three potential Jak/Stat pathway target genes. RNAi analysis of thirty of these candidates was performed in the hopTum-l background to examine their effect on hematopoiesis and classify them as either LTGs or HTGs. Eleven of the thirty genes showed a genetic interaction with the Jak/Stat pathway and these eleven genes were then examined for expression in different genetic backgrounds using quantitative PCR assay. Eight genes showed an increase in expression in genotypes with higher Jak/Stat pathway activity, and these eight genes were selected as potential Stat92E target genes. Lastly, we examined the expression of the selected genes in Drosophila hemocytes. The hemocyte-specific expression of the identified genes provides support to a novel role of these genes in Drosophila hematopoiesis, and possibly also in hematopoietic tumor development.

Committee:

Soichi Tanda (Advisor); Mark Berryman (Committee Member); Donald Holzschu (Committee Member); Sarah Wyatt (Committee Member)

Subjects:

Bioinformatics; Genetics; Molecular Biology

Keywords:

Drosophila hematopoiesis; Jak Stat pathway; Stat92E Target genes; CtBP; Suppressor of Hairless

Holtzapple, Emilee RRelA as a Potential Regulator of Inflammation and Tissue Damage in Streptozotocin-Induced Diabetic STAT5 Knockout Mice
Bachelor of Science (BS), Ohio University, 2016, Biological Sciences
Type 1 Diabetes (T1D) affects 1.25 million Americans, and that number is expected to increase to 5 million by 2050. Failure to properly control blood glucose levels in T1D can result in life-threatening side effects such as kidney damage, also known as diabetic nephropathy (DN), and end-stage renal disease (ESRD). As the incident rate of T1D continues to rise worldwide, understanding DN becomes more important. This can be accomplished by examining the molecular mechanisms of damage in DN. It has been shown that the loss of STAT5 in diabetic mice exacerbates diabetic kidney damage. In this study, we used pathway analysis software to analyze gene expression results previously obtained from a microarray experiment using this diabetic STAT5 knockout (DB SKO) mouse model. We found that expression of many immune system pathways was significantly altered in the kidneys of DB SKO mice, as compared to nondiabetic and wildtype control mice. A number of different immune cell functions were also predicted to be altered. The RelA gene encoding the p65 subunit of NF¿B was predicted to be a common or “master” regulator of many of the differentially expressed genes within our dataset. Using chromatin immunoprecipitation, we found altered numbers of p65-DNA binding interactions in the promoters of differentially expressed genes within the DB SKO kidney, again in comparison to the nondiabetic and wildtype control kidneys. Therefore, our analyses indicate that STAT5 may act through RelA to affect immune system signaling pathways, resulting in an increase in inflammation and tissue damage in the absence of STAT5.

Committee:

Karen Coschigano, Ph.D. (Advisor)

Subjects:

Bioinformatics; Biomedical Research

Keywords:

Type 1 Diabetes; Diabetic Nephropathy; STAT5; Pathway Analysis

Alouani, David JamesTHE AGING PROCESS OF C. ELEGANS VIEWED THROUGH TIME DEPENDENT PROTEIN EXPRESSION ANALYSIS
Master of Sciences, Case Western Reserve University, 2015, Systems Biology and Bioinformatics
The main goal of the present effort is to develop a comprehensive computational and statistical framework for analyzing large scale proteomics data and understand the aging process of Caenorhabditis elegans (C.elegans) nematodes based on the age dependent pattern of the protein expression. Modern numerical methods were used for the analysis, including outlier detection, imputation, entropy and feature selection. Protein expression in C. elegans was found to be highly age dependent. Increased expression in younger nematodes was associated with the activation of metabolic pathways. House-keeping processes, such as proteolysis, protein biogenesis and assembly were found to be important at older age. Network entropy was also found to be age dependent for a significant fraction of proteins. Increased protein expression was associated with reduced entropy. Feature selection analysis further showed that proteins linked to metabolic processes are most predictive of the age of C. elegans nematodes, based on their level of expression.

Committee:

Masaru Miyagi, Dr. (Advisor); David Lodowski, Dr. (Committee Member); Gurkan Bebek, Dr. (Committee Member)

Subjects:

Aging; Bioinformatics; Biology; Biostatistics; Computer Science

Zelinka, Lisa MPROTEIN EXPRESSION AND CHARACTERIZATION OF THE MAJOR AUTOANTIGEN (TITIN DOMAIN) ASSOCIATED WITH AUTOIMMUNE RIPPLING MUSCLE DISEASE
PHD, Kent State University, 2015, College of Arts and Sciences / School of Biomedical Sciences
An autoimmune disease is distinguished by the appearance of several autoantibodies. These autoantibodies can react with components of surface, cytoplasmic or nuclear origin. Autoantibodies can confirm diagnosis and prognosis as well as monitor progression of a certain autoimmune disease. Autoimmune rippling muscle disease (ARMD) is an autoimmune neuromuscular disease associated with myasthenia gravis (MG). Rippling muscle disease is diagnosed by percussion or stretch stimulated wave-like muscle contractions. The propagation of these contractions does not involve motor unit action potentials (muap). Past experiments in our laboratory recognized a very high molecular weight skeletal muscle protein antigen identified by ARMD patient antisera as the titin isoform N2-A, ATP synthase 6 and PPP1R3. These past studies used antisera from ARMD and MG patients as probes to screen a human skeletal muscle cDNA library and several pBluescript clones revealed supporting expression of immunoreactive peptides. Previous experiments in our laboratory have subcloned the immunoreactive domain of titin isoform N2-A into pGEX-3X G-S-T fusion vector (G3RMMG6). Sequence analysis of the glutathione-S-transferase/Titin N2-A fusion gene indicates the cloned titin domain (GenBank accession # EU428487) is in frame and is derived from a sequence of N2-A spanning the exons 248-250 an area that encodes the fibronectin III domain. PCR and EcoR1 restriction mapping studies have demonstrated that the inserted cDNA is of a size that is predicted by bioinformatics analysis of the subclone. Expression and affinity purification of the fusion protein result in the isolation of a polypeptide of 52 kDa consistent with the predicted inferred amino acid sequence. Immunoblot experiments of the fusion protein, using rippling muscle/myasthenia gravis antisera, shows that only the titin domain is immunoreactive. Current experiments in our laboratory have affinity purified the autoantibody from the serum of an (ARMD) (MG) and thymoma patient using the Olmsted method. The affinity purified autoantibody was used for immunofluorescent microscopy. The Olmsted affinity purified autoantibody demonstrated immunoreactivity in a clear and concise striational banding pattern consistent with the striational banding pattern of the I and A bands of human skeletal muscle. Future studies will use the tools developed in this study to explore the functional role of the exon 248-250 domain in muscle contractility.

Committee:

Gary Walker, Dr. (Committee Chair); Eric Mintz, Dr. (Committee Member); Fayez Safadi, Dr. (Committee Member); Wen-Hai Chou, Dr. (Committee Member); Robert Clements, Dr. (Committee Member)

Subjects:

Bioinformatics; Biomedical Research

Keywords:

Autoimmune Rippling Muscle Disease; Myasthenia Gravis; Autoantibody; Protein expression; Protein Purification; Genetic Manipulation of E coli; Bioinformatics

Stetson, Lindsay CComputational Approaches for Cancer Precision Medicine
Doctor of Philosophy, Case Western Reserve University, 2015, Systems Biology and Bioinformatics
Many types of cancer have no proven means of prevention or effective therapies. Precision medicine is an emerging approach for disease treatment that takes into account the biology of the patient in an effort to improve therapeutic outcome. While significant advances have been made in precision medicine when it comes to select cancers such as breast and lung, precision medicine is still not used by clinicians when initiating treatment for most cancer patients. Advances in DNA sequencing and large-scale studies such as The Cancer Genome Atlas (TCGA) have led to a better understanding of the molecular initiators and drivers of cancer, but challenges remain in bringing precision medicine from the bench to the bedside. A major impediment to achieving personalized therapy is the small number of drugs developed to target the proteins encoded by potential driver genes. Additionally, the large numbers of genetic aberrations in cancer discovered through next-generation sequencing have not always translated into actionable drug targets. In this dissertation these two challenges are addressed. First, we demonstrate that data from large-scale pharmacogenomic studies can be computationally mined to create omic signatures of drug response. The benefit of this study is the ability to rapidly and cost-effectively identify drugs and research compounds that can be repositioned or repurposed for use in different cancer types. Additionally, we demonstrate that this approach can successfully identify the precise subgroup of cancer patients that will benefit from a drug treatment based on their unique tumor biology. We then show that proteomics studies completed as part of TCGA can be computationally mined to create protein models predictive of patient survival. The resulting protein signatures can be used by clinicians to identify those patients that are high-risk and should be treated more aggressively or referred to clinical trial. Additionally, proteins that are correlated to patient survival are potential actionable drug targets. Both drug development and clinical trials are expensive; computational approaches such as those described in this dissertation are critical to the cost-effective and timely development of precision medicines.

Committee:

Jill Barnholtz-Sloan, PhD (Advisor); Jean-Eudes Dazard, PhD (Committee Member); Thomas LaFramboise, PhD (Committee Member); Andrew Sloan, MD (Committee Member)

Subjects:

Bioinformatics

Sousounis, KonstantinosGene Expression During Newt Lens Regeneration and Cephalopod Eye Evolution
Doctor of Philosophy (Ph.D.), University of Dayton, 2014, Biology
Newts are known for their ability to regenerate lost body parts. In contrast to many other organ systems, lens regeneration has many advantages. The eye lens can be removed as a whole and regeneration can occur through transdifferentiation of dorsal iris cells while ventral iris can be used as natural non-regenerating control. We have used microarrays, RNA-sequencing and mass spectrometry in dorsal and ventral iris samples during early phases of lens regeneration. The selected time points cover the undamaged control at 0 days post-lentecomy (dpl), the reentry of the cell cycle at 4 dpl and the beginning of transdifferentiation at 8 dpl. The newly assembled newt transcriptome was used to obtain annotation and gene expression measurements on newt genes in our samples. Functional analysis revealed genes related to redox balance, DNA repair, regulation of gene expression, cytoskeleton, immune response, metabolic processes, and cell cycle to be enriched in dorsal iris during regeneration time points. These events were associated with the transdifferentiation initiated in the dorsal iris. In addition, comparative transcriptomic and proteomic analyses using high-throughput gene expression data from other amphibian regeneration systems implicated response to stress, proliferation and migration, and cellular reprogramming to be a common program required for regeneration. Gene expression data from newt lens regeneration were extensively validated with quantitative real time polymerase chain reaction. Furthermore, microarrays in young and old axolotls, another amphibian model that was found capable of lens regeneration from the iris for a short window of two week after hatching, were used. Functional annotation indicated that young regeneration-competent axolotls expressed genes related to regulation of gene expression, electron transport chain, cell cycle, DNA repair and metabolic process – gene groups belonging to the common regeneration program. In addition, we implicated immune response and cell differentiation in repression of lens regeneration in old axolotl iris. Cephalopods are protostome animals that exhibit an impressive vertebrate-like camera-type eye that facilitates high quality vision. Nautilus, however, has a pinhole eye that lacks cornea and lens. We used RNA-sequencing in developing Nautilus and pigmy squid embryos in order to gain more insights into cephalopod eye evolution. Pathway analysis of genes expressed only in Nautilus or pigmy squid developing eyes revealed that SIX3/6 gene is not expressed in the Nautilus. In addition, expression of all the genes regulated by this transcriptional factor was absent. Since, SIX3/6 is necessary for lens development in vertebrates and the gene network between vertebrates and invertebrates is highly conserved we argued that the absence of SIX3/6 in Nautilus leads to the pinhole eye. Functional and molecular evolution analyses of the Nautilus and pigmy squid transcriptomes revealed gene selections, and a gene duplication which might be associated with cephalopod eye evolution as well as in developing a vertebrate-like camera-type eye with invertebrate rhabdomere photoreceptors. The use of high-throughput methods in studying gene expression during newt lens regeneration and cephalopod eye evolution provided us with valuable insights into the underlying mechanisms in these systems.

Committee:

Tsonis Panagiotis, PhD (Advisor); Singh Amit, PhD (Committee Member); Kango-Singh Madhuri, PhD (Committee Member); Williams Thomas, PhD (Committee Member); Del Rio-Tsonis Katia, PhD (Committee Member)

Subjects:

Bioinformatics; Biology; Molecular Biology

Keywords:

regeneration; lens; newts; eye evolution, cephalopods, gene expression

Balanis, Nikolas GDIVERSE ROLES FOR EGF RECEPTOR SIGNALING IN THE BREAST CANCER TUMOR MICROENVIRONMENT
Doctor of Philosophy, Case Western Reserve University, 2013, Physiology and Biophysics
The ligand/ cell surface receptor interaction is a paradigm for how cells utilize extracellular cues to `signal’ to their intracellular environment. Ligand/receptor interactions are important in almost all biological processes. However, focusing solely on the ligand/receptor interaction excludes the many possible `ligand-independent’ modes of surface receptor action, vis a vis those that occur in `trans’. Our studies sought to uncover the role of extracellular matrix, important molecules in the tumor microenvironment, as they function through integrin receptors to regulate the receptor tyrosine kinase, EGF Receptor (EGFR). This thesis links EGFR/integrin crosstalk to the control of cellular protrusions such as lamellipodia and filopodia, as well as the control of stress fiber formation necessary for cell contractility. We have also linked EGFR/integrin crosstalk to activation of the Signal transducer and activator of transcription 3 (Stat3). We have shown that activation of Stat3 is necessary for EGFR induced transformation of normal mammary epithelial cells. We provide mechanistic insights into how certain breast cancers “switch” from EGFR/Stat3 signaling to Fibronectin/Stat3 signaling following epithelial-to-mesenchymal transition (EMT). We have shown that following EMT, breast cancers are desensitized to EGFR inhibition and become sensitized to Janus kinase 2 (Jak2) inhibitors. This finding may describe why certain cancers are resistant to EGFR directed therapies. Finally, we have identified the EGFR inhibitor protein Mitogen inducible gene 6 (Mig6) as essential to cell survival in Triple-negative breast cancer (TNBC). The observations provided in this thesis uncover the role of EGFR in the tumor microenvironment and provide insight into novel therapies for TNBC.

Committee:

Witold Surewicz (Committee Chair); Cathleen Carlin (Advisor); Kalnay Brady (Committee Member); Tom Egelhoff (Committee Member); Stephen Jones (Committee Member)

Subjects:

Biochemistry; Bioinformatics; Microbiology; Molecular Biology

Keywords:

breast cancer; EGFR; EGF Receptor; Triple Negative Breast Cancer; Epithelial-to-Mesenchymal Transition; EMT; focal adhesion; fibronectin; IL-6; JAK2; MIG6; STAT3; TNBC; FN; PYK2; FAK; NMUMG; Basal-like

Nicol, Megan EUnraveling the Nexus: Investigating the Regulatory Genetic Networks of Hereditary Ataxias
Bachelor of Science (BS), Ohio University, 2014, Biological Sciences
Hereditary ataxias are complex, rare autosomal recessive diseases that receive limited funding and public attention. While it is known that the most common types of hereditary ataxia are caused by mutations in a single gene, the extent to which molecular pathways, like DNA repair, are changed remains largely unknown. In order to learn more about how the five main DNA repair mechanisms are altered in the disease state and what changes are common across hereditary ataxia, four microarray dataset representing Friedreich's Ataxia, Ataxia Telangiectasia, and Spinocerebellar Ataxia Type 2 were obtained from NCBI. Using R and three Bioconductor annotation packages, the fold change level of each probe was calculated and mapped to corresponding gene symbols. Discriminative motif finding was performed on promoter regions of genes of interest, which represented possible transcription factor binding sites. In order to understand the protein interactions of each DNA repair pathway, the STRING database tool was employed, and the connections established here were combined with all other results to produce an informative network image for each DNA repair pathway. Our findings showed that DNA repair mechanisms in each form of ataxia shared three similarities, but that each disease had unique differences that may have implications for the differences in disease presentation. In all ataxias investigated in this study, OGG1 and RAD50 are underexpressed, while PMS1 is overexpressed. Furthermore, each ataxia has at least one form of DNA ligase that is underexpressed, which likely hinders the ability to fully fix DNA breakage. These results have the potential to be used by future researchers as targets for therapy or in the development of diagnostic tests. We conclude that there is shared differential expression of key DNA repair genes among hereditary ataxias, and these similarities may help us understand why the presentation of these diseases are so similar.

Committee:

Tanda Sochi, Dr. (Advisor); Lonnie Welch, Dr. (Advisor)

Subjects:

Bioinformatics; Biology; Genetics

Keywords:

Ataxia; Hereditary Disease; Genetics; Computer Science; Bioinformatics

Li, HuamengMultiple Ligand Simultaneous Docking (MLSD) and Its Applications to Fragment Based Drug Design and Drug Repositioning
Doctor of Philosophy, The Ohio State University, 2012, Biophysics

This thesis presents a novel multiple ligand simultaneous docking (MLSD) method for simulating protein-ligand molecular recognition and a novel protocol for fragment-based drug design by combining MLSD and drug repositioning. Different cancer molecular targets, namely GP130 and STAT3, in IL-6/GP130/STAT3 signaling pathway were used as use cases for the proposed MLSD and drug design protocol.

Conventional docking methods simulate only one single ligand at a time during docking process. In reality, molecular recognition process always involves multiple molecular species. The first part of this research developed a MLSD simulation method which can simulate the orchestrated action of multiple ligands binding to the active site of protein. The methodology proves robust through systematic testing against several diverse model systems: E. coli PNP complex with two substrates, SHP2NSH2 complex with two peptides and cancer target Bcl-xL in complex with ABT-737 fragments. In ABT-737 and SHP2NSH2 cases, conventional single ligand docking failed to find correct binding modes due to energetic and dynamic coupling among ligands, whereas MLSD resulted in the correct binding modes. In PNP case, the MLSD simulations captured the binding dynamics, which is consistent with proposed enzymatic mechanism from the experiment. The work also compared two search strategies: Lamarckian Genetic Algorithm (LGA) and Particle Swarm Optimization (PSO), which had respective advantages depending on the specific systems.

Molecular docking finds its major applications in drug design and discovery. Conventional high throughput screening (HTS) drug discovery approach identifies many hits, but few of them can be developed into drugs. The second part of this research applied MLSD to fragment-based drug design and proposed a novel drug discovery protocol by combining MLSD and drug repositioning. It proceeds as follows. 1. A small library of drug scaffolds is identified for the binding hot spots of target protein. 2. Selected drug fragments are simultaneously docked to protein binding sites by MLSD, like fitting the right piece into the right place in jigsaw puzzle, and tethered properly to generate virtual drug templates. 3. Structure or chemical feature similarity search of template compounds on drug databases can potentially reposition existing drugs to new targets. Cancer targets GP130 and STAT3 were used as two test cases. In the case of STAT3, drug scaffolds were simultaneously docked into hot spots of STAT3 SH2 domain by MLSD, followed by tethering to generate virtual template compounds. Similarity searching of virtual compounds in DrugBank identified Celecoxib as a novel inhibitor of STAT3. A few novel compounds were designed from virtual templates, which demonstrated more potent inhibition of STAT3. In the case of target GP130, our approach quickly identified drug Raloxifene and Bazedoxifene, as novel inhibitors to disrupt IL-6/GP130 protein-protein interactions through targeting GP130 D1 domain, which were confirmed by multiple cancer cell assays.

The MLSD and the drug discovery protocol presented hold exciting potential for modeling molecular recognition and fragment based drug design for other therapeutic targets.

Committee:

Chenglong Li (Advisor); Kun Huang (Committee Member); Michael Poirier (Committee Member); Guo-Liang Wang (Committee Member)

Subjects:

Bioinformatics; Biomedical Research; Biophysics; Molecular Biology; Molecular Chemistry; Pharmacology

Keywords:

multiple ligand simultaneous docking; MLSD; fragment based drug design; STAT3; GP130; Drug Repositioning; PSO

Nagavaram, AshishCloud Based Dynamic Workflow with QOS For Mass Spectrometry Data Analysis
Master of Science, The Ohio State University, 2011, Computer Science and Engineering

Lately, there is a growing interest in the use of cloud computing for scientific applications, including scientific workflows. Key attractions of cloud include the pay-as-you-go model and elasticity. While the elasticity offered by the clouds can be beneficial for many applications and use-scenarios, it also imposes significant challenges in the development of applications or services. For example, no general framework exists that can enable a scientific workflow to execute in a dynamic fashion with QOS (Quality of Service) support, i.e. exploiting elasticity of clouds and automatically allocating and de-allocating resources to meet time and/or cost constraints while providing the desired quality of results the user needs.

This thesis presents a case-study in creating a dynamic cloud workflow implementation with QOS of a scientific application. We work with MassMatrix, an application which searches proteins and peptides from tandem mass spectrometry data. In order to use cloud resources, we first parallelize the search method used in this algorithm. Next, we create a flexible workflow using the Pegasus Workflow Management System from ISI. We then add a new dynamic resource allocation module, which can use fewer or a larger number of resources based on a time constraint specified by the user. Finally we extend this to include the QOS support to provide the user with the desired quality of results. We use the desired quality metric to calculate the values of the application parameters. The desired quality metric refers to the parameters that are computed to maximize the user specified benefit function while meeting the time constraint. We evaluate our implementation using several different data-sets, and show that the application scales quite well. Our implementation effectively allocates resources adaptively and the parameter prediction scheme is successful in choosing parameters that help meet the time constraint.

Committee:

Gagan Agrawal, PhD (Advisor); Rajiv Ramnath, PhD (Committee Member); Michael Freitas, PhD (Committee Member)

Subjects:

Bioinformatics; Biomedical Engineering; Biomedical Research; Computer Engineering; Computer Science

Keywords:

cloud;dynamic workflow;adaptive execution on cloud;parallelization on cloud;time constraint execution;QOS on cloud;parameter prediction;parameter modeling

Gangadharaiah, Dayananda SagarPATTERNS OF DIPEPTIDE USAGE FOR GENE PREDICTION
Master of Science in Computer Engineering (MSCE), Wright State University, 2010, Computer Engineering
As the number of complete genomes that have been sequenced continues to grow rapidly, the identification of genes regions in DNA sequence data remains one of the most important open problems in bio-informatics. Improving the accuracy of such gene finding tools by a small percentage would affect accurate predictions of many genes of an organism (Zhu et al., 2010). This thesis presents a novel approach for identifying coding regions of a genome based on dipeptide usage. The patterns in dipeptide usage are used to discriminate between coding and non-coding DNA regions. Two sample T-tests are used as tests of significance to determine the dipeptides that show significant difference in their occurrences in coding and non-coding regions. These methods are primarily tested on Escherichia coli -536 genome, where they reached an accuracy of 96.5% in identifying coding region and 100% accuracy in identifying non-coding regions. The trained classifier data Escherichia coli-536's genome is utilized to predict the coding and non-coding regions of Salmonella enterica subsp. enterica serovar Typhi's genome. The results of these experiments showed an accuracy of 79.5% in predicting coding regions and 100% in predicting non-coding regions of Salmonella enterica subsp. enterica serovar Typhi's genome.

Committee:

Travis Doom, PhD (Advisor); Michael Raymer, PhD (Committee Member); Sridhar Ramachandran, PhD (Committee Member)

Subjects:

Bioinformatics

Keywords:

DIPEPTIDE; coding regions; coding; endl; coding and non-coding regions; coding and non-coding; char

Chen, JingComputational Selection and Prioritization of Disease Candidate Genes
PhD, University of Cincinnati, 2008, Engineering : Biomedical Engineering

Identifying causal genes underlying susceptibility to human disease is a problem of primary importance in post-genomic era and current biomedical research. Recently, there has been a paradigm shift of such gene-discovery efforts from rare, monogenic conditions to common "oligogenic" or "multifactorial" conditions such as asthma, diabetes, cancers and neurological disorders. These conditions are referred as multifactorial because, susceptibility to these diseases is attributed to the combinatorial effects of genetic variation at a number of different genes and their interaction with relevant environmental exposures. The expectation is that identification and characterization of the causal genes implicated in the inherited component of disease susceptibility will lead to substantial advances in our understanding of disease. These advances in turn can lead to improvements in diagnostic accuracy, prognostic precision, the range and targeting of available therapeutic options and ultimately realize the promise of personalized or "tailor-made" medicine. The objective of my thesis therefore is to design, develop, and validate computational approaches for identification and prioritization of these causal genes.

The first approach tests the hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships. We use a p-value-based meta-analysis method to prioritize the candidate genes based on functional annotation. For the very first time, we use and demonstrate, the utility of mouse phenotype annotations in human disease gene prioritization. Since this approach is limited to only genes with functional annotation, and because many human genes are yet to be functionally classified, we have developed another approach that is independent of gene functional annotations. We implemented a set of new algorithms to prioritize genes based on protein-protein interaction networks. Large scale cross-validation were performed for comparison and evaluation of the methods, and to determine the associated parameters. Our results demonstrate that the functional annotation-based method performs better than other approaches. Although the performance of the network-based method was not as good as functional annotation-based method, it is much simpler to implement, apply, and execute. The best performance was however achieved, as demonstrated through asthma test case, by combining the results from the two methods.

Committee:

Bruce Aronow (Committee Chair); Anil Jegga (Committee Co-Chair); Marepalli Rao (Committee Member)

Subjects:

Bioinformatics

Keywords:

candidate gene prioritization; bioinformatics; interactome; mouse phenotype; asthma; protein-protein interaction network; functional annotation

Grigsby, Claude CurtisA Comprehensive Tool and Analytical Pathway for Differential Molecular Profiling and Biomarker Discovery
Doctor of Philosophy (PhD), Wright State University, 2013, Biomedical Sciences PhD
The key requirements to any empirically based study are to: (1) accurately measure and then compare the collected results in determining the result of the hypothesis being tested; and (2) collect a sample representative of the entities being studied. To demonstrate that an informatics tool can be designed that provides spectral registration, spectral and chromatographic alignment, visualization, and comparative analysis for data generated from multiple analytical platforms, e.g., LC-MS and GC-MS, the results and data analysis of five unique sets of experiments using a suite of novel informatics tools are presented. Comprehensive and reproducible sample collection techniques were developed concomitantly with the informatics tool and used in the multiple, independent studies for the validation and further development of generated software tools and approaches. Data from a dose-response study examining an organ specific environmental toxicant exposure was analyzed using the prototype software tool for discovery of LC/MS based metabolomic biomarkers. This data set served as proof of concept in the development and illustration of the novel approach to spectral registration and visualization, and illustrates the rapid multi-sample analysis capability of the informatics tool. A variety of additional studies focused on volatile biomarker discovery, i.e., a murine model of infection to select agents, characterization of human and murine urine as it ages, human markers of age and ethnicity in axillary odors, and characterization of the binding between volatile ligands and murine major urinary proteins aided in algorithm and interface development for GC/MS functionality implemented in the developed software. The final phase of this work focused on utilization of these analysis tools in combination with novel sampling techniques to create an end-to-end discovery pipeline for large-scale small molecule and volatile organic compound biomarker and differential profiling studies. This combination of biologically and environmentally focused studies were successfully completed as final proof of concept for this work and demonstrate the universal utility of the approach.

Committee:

David Cool, PhD (Advisor); Mateen Rizki, PhD (Committee Member); Gerald Alter, PhD (Committee Member); Thomas Lamkin, PhD (Committee Member); Jeffery Gearhart, PhD (Committee Member)

Subjects:

Bioinformatics; Biomedical Research; Toxicology

Keywords:

biomarker; volatile organic compound, differential profiling, GCMS, mass spectral analysis

Paul, SinuHost-pathogen interactions and evolution of epitopes in HIV-1: understanding selection and escape
PHD, Kent State University, 2012, College of Arts and Sciences / Department of Biological Sciences
Epitopes, the specific regions encoded by pathogens’ genomes that are recognized by the host immune receptors, play an important role in host-pathogen interaction and in turn, in determining the progression of the disease. While some of the mutations in epitope regions make the immune system unable to recognize epitopes and thus result in escape from the host immune response, other mutations may significantly reduce the viral fitness. This study explores the patterns of molecular evolution across different epitope regions and identifies a set of highly conserved epitopes in the Human Immunodeficiency Virus-1 (HIV-1) genome in an attempt to discover epitope vaccine candidates. To delineate the extent of selection pressure driven by interactions with the immune system at different CTL, T-Helper and Antibody epitope regions, the levels of sequence identity of 603 HIV-1 epitopes were examined in HIV-1, HIV-2 and SIV reference genomes. Despite rather high degree of sequence variability of the HIV-1 genome, several epitopes in Gag and Pol genes were identified as comparatively more conserved than those elsewhere in the genome. The results also showed that CTL epitopes were much more conserved compared to T-helper and antibody epitopes. Using data mining technique association rule mining, it was shown that some of these highly conserved CTL and T-helper epitopes co-occurred in various HIV-1 genomes irrespective of subtype, recombination status or geographical location suggesting co-evolution and association between these epitopes. These highly conserved and co-evolving epitopes should be considered as potent candidates for a multi-epitope vaccine and/or promising drug targets against HIV-1. In addition, several computational tools were developed to facilitate the analysis of epitope sequences and the extensive information on the sequence identity of all the HIV-1 epitopes were collated in a database. This novel strategy in identifying co-evolving epitopes and the detailed information on HIV-1 epitope conservation can further aid in the design of an efficient vaccine.

Committee:

Helen Piontkivska, PhD (Advisor); Christopher Woolverton, PhD (Committee Member); Walter Hoeh, PhD (Committee Member); Ruoming Jin, PhD (Committee Member); Quan Li, PhD (Committee Member)

Subjects:

Bioinformatics; Biology; Evolution and Development

Keywords:

Epitopes; HIV-1; Vaccine candidates; Association rule mining; Epitope analysis tools; Epitope database; Data mining; Bioinformatics; Molecular Evolution

Perikala, Satish KumarEvolution of Epitope regions in HIV genome: Delineating Selective Forces acting on Conformational and Linear Epitopes
MS, Kent State University, 2010, College of Arts and Sciences / School of Biomedical Sciences

This study is focused on mechanisms of molecular evolution of different epitope regions in HIV-1 genome, particularly, assessing the extent of nucleotide and amino acid sequence conservation and delineating selective forces acting on conformational and linear epitopes. The pattern of evolutionary changes influenced by selective pressures in the HIV-1 genome of B and Recombinant forms were assessed by 1) Estimating the extent of synonymous and nonsynonymous nucleotide substitutions and by 2) Estimating the numbers of radical and conservative amino acid changes.

The patterns of nucleotide and amino acid substitutions were estimated at conformational and linear epitopes and were contrasted among different types of epitopes to determine the pattern of selective pressure acting across different types of epitope regions. The results showed a pattern of strong purifying selection acting at the majority of epitope regions in all three major genes surveyed, Gag, Pol and Env. With respect to amino acid substitutions, while the conservative changes outnumbered radical in majority of the epitope regions (thus, indicating that purifying selection is a dominant selective force removing deleterious effects of drastic amino acid changes), some HIV-1 genomic regions showed a trend toward an increased number of radical amino acid changes.

Overall, this study showed that conformational epitopes are much more conserved than linear epitopes, and that although conformational epitope regions evolve predominantly through purifying selection, some sites within these regions may also be subject of positive selection.indicates that, similar to linear epitopes, conformational epitopes are also experiencing conflicting (and potentially episodic) selective pressures between positive selection that favors mutations to facilitate escape from the host immune system and purifying selection due to functional and structural constraints acting at the protein level.

Committee:

Helen Piontkivska, PhD (Committee Chair); Gail Fraizer, PhD (Committee Member); Michael Tubergen, PhD (Committee Member)

Subjects:

Bioinformatics

Keywords:

Conformational Epitopes; Linear Epitopes; HIV; Selective Forces; synonymous changes;nonsynonymous changes;Radical changes;Conservative changes

Li, HuiAlgorithms for the selection of optimal spaced seed sets for transposable element identification
Master of Computer Science, Miami University, 2010, Computer Science and Systems Analysis
Spaced seeds have proved to be invaluable in BLAST-like homology searches of large genomic sequences. But the problem of evaluating spaced seeds for this purpose is computational challenging, while the problem of finding optimal multi-seed sets is known to be NP-hard. In this thesis, we first explored the unpublished details and implemented the dynamic programming algorithm of the Li et al. PatternHunter group to address the problem of evaluating a multiple spaced seed set, and implemented our own version of this algorithm. We then developed a genetic algorithm to address the problem of optimal multi-seed set selection and found that our solutions superior to the greedy algorithm. Finally we implemented two additional tools with the goal of applying our results to the problem of transposable element identification.

Committee:

John Karro, PhD (Advisor); Alton Sanders, PhD (Committee Member); James Kiper, PhD (Committee Member)

Subjects:

Bioinformatics; Computer Science

Keywords:

spaced seed set; genetic algorithm; hit probability; transposable element; greedy algorithm

Next Page