Search Results

1. Hossain, Md Ismail Drug Discovery Targeting Bacterial and Viral non-coding RNA: pH Modulation of RNA Stability and RNA-RNA Interactions

Doctor of Philosophy (PhD), Ohio University, 2022, Chemistry and Biochemistry (Arts and Sciences)

Antibiotic resistance is a global threat beside the ongoing pandemic by SARS-CoV-2. The number of deaths due to antibiotic-resistant infections is increasing at an alarming rate. The COVID-19 pandemic has already claimed millions of deaths worldwide. Fighting against antibiotic-resistant superbugs and the SARS-CoV-2 has become a challenge. A significant amount of research is going on to develop the vaccine and small molecule antiviral and antibacterial therapeutics targeting proteins. Fortunately, novel non-coding regulatory RNA targets have been identified for developing new antibacterial and antiviral drugs such as bacterial T-box riboswitch, RNA thermometers, and viral stem-loop II motif. T-box riboswitch can control the transcription or translation of amino acid-related genes in bacteria by forming unique interactions between tRNA and mRNA. RNA thermometers (RNATs) are temperature-responsive riboswitches that control the translation based on temperature sensing thus controlling the interaction with the mRNA and 16S rRNA. In Shigella dysenteriae, three RNATs, i.e., ompA, shuT, and shuA, have been discovered. ompA RNAT controls the translation of outer membrane protein A. shuT, and shuA RNAT controls the translation of two proteins that are crucial to the bacterial heme utilization system. The Stem-loop II motif (S2M) is a highly conserved RNA element found in most coronaviruses, astroviruses, and picornaviruses that plays a potential role in viral replication and invasion. The RNA structure plays a significant role in its regulatory function for all of these potential therapeutic targets. Consequently, it is essential to examine the factors that affect the RNA structure and RNA-RNA interaction. Despite having limited building blocks, RNA has diverse functions in the cells. Base protonation and protonated base pairs often occur in RNA when interacting with other biomolecules, thus could play a critical role in vital biological processes. Diff (open full item for complete abstract)

Committee: Jennifer Hines (Advisor) Subjects: Biochemistry; Biology; Genetics

2. Li, Yichao Algorithmic Methods for Multi-Omics Biomarker Discovery

Doctor of Philosophy (PhD), Ohio University, 2018, Electrical Engineering & Computer Science (Engineering and Technology)

The central dogma of molecular biology states that DNA is transcribed into RNA, which is then translated into proteins. The flow of genetic information in time and space is orchestrated by complex regulatory mechanisms. With the advent of modern biotechnology, our understanding of genomics, transcriptomics, and proteomics has deepened. However, bioinformatic tools for biomarker discovery in the different types of omics are still lacking. To address these issues, we developed novel algorithmic methods for three primary omics. Proteins are the main executor of cellular functions. In the proteomic level, we developed machine learning models for early diagnosis of type 2 diabetes based on the abundance of post-translational modifications (PTMs). Our models can interpret mass spectrometry data and perform integrative analysis together with clinical parameters such as HbA1C and fasting plasma glucose. In the results, we identified glycated lysine-141 of haptoglobin to be a potential biomarker. Gene regulation is conducted by cis-regulatory elements and transcription factors. In the transcriptomic level, we developed Emotif Alpha bioinformatic pipeline for DNA motif discovery and selection using RNA-seq, ChIP-seq, and gene homology data. We applied this pipeline to multiple species, including human, mouse, plants, and nematodes. The discovered motifs were validated using Gaussia Luciferase (GLuc) reporter. The 3D genome architecture in the nucleus involves spatial organization of nuclear bodies such as the histone locus body (HLB). In the 3D genomics level, we developed a bioinformatic pipeline for characterizing locus-specific chromatin interactions. Specifically, we integrated Hi-C, GAM, and SPRITE data and identified complex chromatin organization signature of the Hist1 cluster in mouse embryonic stem cell (mESC). In addition, we performed network hub analysis and identified hubs of diverse functions. These hubs contained not only histone genes and other active genes, b (open full item for complete abstract)

Committee: Lonnie Welch (Advisor); Razvan Bunescu (Committee Member); Liu Jundong (Committee Member); Frank Drews (Committee Member); Allan Showalter (Committee Member); Shiyong Wu (Committee Member) Subjects: Bioinformatics; Computer Science

3. Kuntala, Prashant Kumar Optimizing Biomarkers From an Ensemble Learning Pipeline

Master of Science (MS), Ohio University, 2017, Electrical Engineering & Computer Science (Engineering and Technology)

Understanding gene expression pattern is crucial in deciphering any observed biological phenotypes. Transcription factors (TF) are proteins that regulate genes by binding to a transcription factor binding site (TFBS) within the promoter region of a gene. Motif discovery is a computational approach that conventionally uses stochastic models, enumeration methods and many other techniques to report candidate motifs (TFBS). These methods generate similar motifs for a TF due to various reasons. Motif selection algorithms successfully identify a small set of motifs that address the specificity problem and coverage problem in motif discovery. However, these selected motifs do not always capture all the binding site preferences for a TF. This study verifies the hypothesis that motif discovery tools generate similar motifs for a transcription factor and once these variants (similar motifs) are identified, they can be used to form a super motif set, which may improve the accuracy of motif discovery. This study introduces the concept of Super motif set, a new model to accurately predict the binding sites for a TF. Two heuristic algorithms are introduced to identify Super motif sets, utilizing motif selection algorithms and a motif comparison tool. These super motif sets identified, capture the biological diversity in TFBS preferences of a TF. The algorithms are valuated on ChIP-seq data for 54 TF factor groups from the ENCODE project. Moreover, the proposed algorithms are used to optimize the motifs that are reported by motif selection algorithms and to report super motif sets in three case studies: Chagas disease, pollen specific HRGP genes in Arabidopsis thaliana and Shigellosis. On an average two motif variants are added to the selected motifs, which improve the accuracy of motif discovery.

Committee: Frank Drews (Advisor); Lonnie Welch (Committee Chair); Jundong Liu (Committee Member); Erin Murphy (Committee Member) Subjects: Bioinformatics; Biology; Biomedical Research; Computer Engineering; Computer Science; Genetics; Molecular Biology

4. Al-Ouran, Rami Motif Selection: Identification of Gene Regulatory Elements using Sequence Coverage Based Models and Evolutionary Algorithms

Doctor of Philosophy (PhD), Ohio University, 2015, Electrical Engineering & Computer Science (Engineering and Technology)

The accuracy of identifying transcription factor binding sites (motifs) has increased with the use of technologies such as chromatin immunoprecipitation followed by sequencing (ChIP-seq), but this accuracy remains low enough that bioinformaticians and biologists struggle in choosing the right methods for identifying such regulatory elements. Current motif discovery methods typically produce lengthy lists of putative transcription factor binding sites, and a significant challenge lies in how to mine these lists to select a manageable set of candidate sites for experimental validation. Additionally, despite the importance of covering large numbers of genomic sequences, current motif discovery methods do not consider the sequence coverage percentage. To address the aforementioned problems, the motif selection problem is introduced and solved using a coverage based model greedy algorithm and a multi-objective evolutionary algorithm. The motif selection problem aims to produce a concise list of significant motifs which is both accurate and covers a high percentage of the genomic input sequences. The proposed motif selection methods were evaluated using ChIP-seq data from the ENCyclopedia of DNA Elements (ENCODE) project. In addition, the proposed methods were used to identify putative transcription factor binding sites in two case studies: stage specific binding sites in Brugia malayi, and tissue specific binding sites in hydroxyproline-rich glycoprotein (HRGP) genes in Arabidopsis thaliana.

Committee: Lonnie Welch (Advisor) Subjects: Bioinformatics; Computer Science

5. Schmidt, Robert Using Weighted Set Cover to Identify Biologically Significant Motifs

Master of Science (MS), Ohio University, 2015, Computer Science (Engineering and Technology)

One of the greatest challenges of mankind is understanding how living organisms operate, and a key step towards understanding this challenge is identifying how genes are regulated. Promoter regions play a key role in the regulation of genes via sequences of DNA base pairs known as transcription factor binding sites. When a transcription factor binding site is activated, the genes associated with the transcription factor binding site are transcribed, the first step towards creating proteins. The identification of transcription factor binding sites has come a long way with the advancements of next generation sequencing technologies and projects like ENCODE, but still relies on motif discovery algorithms to pinpoint the exact binding sites. In this thesis, the motif discovery problem is explored and a novel method based on weighted set cover is presented to identify the minimal set of motifs, with objective functions, that discriminately cover a set of DNA sequences. The results show that some motif set cover methods can more accurately identify biologically significant motifs over simply selecting the top scoring motifs. However, the weighed set cover algorithms did not perform exceptionally well when compared to standard selection methods, which is attributed to the use of a discriminative motif discovery application. Detailed results can be found at http://motifpipeline.com.

Committee: Lonnie Welch (Advisor); David Juedes (Committee Member); Sonsoles De Lacalle (Committee Member); Frank Drews (Committee Member) Subjects: Bioinformatics; Computer Science

6. Naik, Ashwini Mining Gene Regulatory Motifs Using the Concept of Sequence Coverage

Master of Science (MS), Ohio University, 2014, Computer Science (Engineering and Technology)

Transcription factors bind to specific sequence elements present in the promoter regions of co-expressed genes and regulate their expression. Genes expressed in an identical manner may have the same transcription factors binding to them, the binding sites being similar with a probable difference of one or two nucleotides. Therefore, a direct inference is that similar sequence elements are present in all the co-expressed genes, with a moderate to high occurrence frequency. These elements are termed motifs. The bioinformatics society currently has a number of effective de-novo motif discovery tools that endeavor to find these motifs through a search for over-represented patterns in gene promoter sequences. Any significant binding sites found through the search procedure will help understand the mechanisms of gene regulation. One significant drawback of current tools is the volume of candidate motifs reported, often numbering in the hundreds or greater, which may result in impractical lab verification in terms of time and resources. This paper presents three methods for solving the problem, namely Random Method, Greedy Method and Hill climbing Method, which substantially reduce the list of candidate motifs to those showing greatest potential.

Committee: Lonnie Welch Dr. (Advisor) Subjects: Bioinformatics; Computer Science

7. Wolfe, Richard In Silico Discovery of Pollen-specific Cis-regulatory Elements in the Arabidopsis Hydroxyproline-Rich Glycoprotein Gene Family

Master of Science (MS), Ohio University, 2014, Computer Science (Engineering and Technology)

Within every cell is a copy of an organism's DNA. This copy of DNA has all of the information needed for the cell to express every gene in the organism's genome. Although each cell is capable, individual cells do not express every gene in their DNA. The genes expressed by a cell are regulated by transcription factors (TFs) that bind to a transcription factor binding site (TFBS) located in the promoter region of the gene. TFs must bind to TFBSs in order for a gene to be expressed. Tissues are groups of cells that perform a specific function; therefore, the cells of a specific tissue express genes that are not expressed in other cell types. Hydroxyproline-rich glycoprotein (HRGPs) are proteins that are found in the plant cell wall, and they can be further classified according to the degree they are glycosylated as arabinogalactan-proteins (AGPs), extensins (EXTs), and proline-rich proteins (PRPs). Currently, the TFBSs for EXTs, AGPs, and PRPs expressed in the pollen cells of Arabidopsis are unknown andtheir discovery will provide a better understanding of the regulatory and evolutionary processes of these genes. Motif discovery and other bioinformatics tools were used to search the promoter regions of EXT, AGP, and PRP genes expressed in the Arabidopsis pollen cells and select motifs that are putative TFBSs. The best set of motifs discovered as putative pollen-specific TFBSs are GCYAMGKA, ACTMGGAA, CATSAAAMGA, and ATTKGKTTCT. Of the 8 pollen-specific promoters, GCYAMGKA occurs in 5 promoters,ACTMGGAA occurs in 2 promoters, CATSAAAMGA occurs in 4 promoters, and ATTKGKTTCT occurs in 3 promoters. Also, all of the 8 HRGP pollen-specific promoters have anoccurrence of at least one of these four motifs and none of the four motifs occur in the 84 HRGP promoters of genes not expressed in pollen cells.

Committee: Lonnie Welch (Advisor) Subjects: Bioinformatics; Computer Science

8. Li, Lizhi Graphic Network based Methods in Discovering TFBS Motifs

Master of Science, The Ohio State University, 2012, Biophysics

To find motifs of transcriptional factors binding sites (TFBS) is essential to understand many biological processes in a cell. Currently the algorithms in discovering the motifs can be divided into three categories: word numeration methods, probabilistic based methods and newly developed graphic network based methods. Graphic network based methods show their advantages over the other two categories of algorithms on prediction accuracy, sensitivity and specificity. This thesis gives a comprehensive overview the main motif discovery methods which are being used now and especially, focuses on the introduction of graphic network based methods. In addition, a study in discovering the TFBS motifs of E2F1, which is a well-known transcription factor, is performed by applying graphic network based algorithms.

Committee: Kun Huang (Advisor); Victor Jin (Committee Member) Subjects: Biophysics

9. Kurz, Kyle A Parallel, High-Throughput Framework for Discovery of DNA Motifs

Master of Science (MS), Ohio University, 2010, Computer Science (Engineering and Technology)

The search for genomic information has just begun. New genomes are sequenced daily, and each brings new challenges and knowledge to the scientific table that must be carefully mined and studied to glean out every possible bit of information. The amount of data created during genomic sequencing is simply too great for researchers to handle, creating a need for computational tools capable of processing the genomic input and analyzing it for information. The area of bioinformatics focuses on this combination of computer science and biology, bringing useful software applications to the table in an effort to ease the workload of biologists. One specific area of interest to biological researchers is the study of DNA words or motifs as they relate to gene regulation. These regulatory elements may be transcription factor binding sites (TFBS), which bind RNA polymerase II to the DNA strand, or enhancer/silencer sequences that up- and down-regulate transcription of the gene to which they are related by binding specific proteins. Many tools such as Weeder [43], WordSpy[65] and YMF [55] are currently available for the study of over- and under-represented words in a DNA sequence, a trait which is believed to useful in identification of these regulatory elements. These tools all perform similar tasks by enumerating all words, or substrings, found in their input, then scoring and ranking these resulting words for presentation to the user. Optionally, many tools also cluster groups of words together to form degenerate motifs which allow for evolutionary and environmental variation in the binding site. The Open Word Enumeration Framework (OWEF), presented in this thesis, providesa new framework on which DNA word enumeration tools can be built. The OWEF framework provides a set of abstract base classes representing the core stages of a word enumeration tool and defines a set of standard interfaces for each stage, allowing multiple algorithmic implementations of these base classes to (open full item for complete abstract)

Committee: Lonnie Welch PhD (Committee Chair); Frank Drews PhD (Committee Member); Chang Liu PhD (Committee Member); Robert Colvin PhD (Committee Member) Subjects: Bioinformatics; Computer Science

Basic Search

Left Column

Filters

Right Column

Search Results

Search Results

Mini-Tools

Search Report

1. Hossain, Md Ismail Drug Discovery Targeting Bacterial and Viral non-coding RNA: pH Modulation of RNA Stability and RNA-RNA Interactions

2. Li, Yichao Algorithmic Methods for Multi-Omics Biomarker Discovery

3. Kuntala, Prashant Kumar Optimizing Biomarkers From an Ensemble Learning Pipeline

4. Al-Ouran, Rami Motif Selection: Identification of Gene Regulatory Elements using Sequence Coverage Based Models and Evolutionary Algorithms

5. Schmidt, Robert Using Weighted Set Cover to Identify Biologically Significant Motifs

6. Naik, Ashwini Mining Gene Regulatory Motifs Using the Concept of Sequence Coverage

7. Wolfe, Richard In Silico Discovery of Pollen-specific Cis-regulatory Elements in the Arabidopsis Hydroxyproline-Rich Glycoprotein Gene Family

8. Li, Lizhi Graphic Network based Methods in Discovering TFBS Motifs

9. Kurz, Kyle A Parallel, High-Throughput Framework for Discovery of DNA Motifs

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Basic Search

Left Column

Filters

By Year

Degree Name

Submission Site

Subject

Language

Right Column

Search Results

Search Results

Mini-Tools

Search Report

1. Hossain, Md Ismail Drug Discovery Targeting Bacterial and Viral non-coding RNA: pH Modulation of RNA Stability and RNA-RNA Interactions

2. Li, Yichao Algorithmic Methods for Multi-Omics Biomarker Discovery

3. Kuntala, Prashant Kumar Optimizing Biomarkers From an Ensemble Learning Pipeline

4. Al-Ouran, Rami Motif Selection: Identification of Gene Regulatory Elements using Sequence Coverage Based Models and Evolutionary Algorithms

5. Schmidt, Robert Using Weighted Set Cover to Identify Biologically Significant Motifs

6. Naik, Ashwini Mining Gene Regulatory Motifs Using the Concept of Sequence Coverage

7. Wolfe, Richard In Silico Discovery of Pollen-specific Cis-regulatory Elements in the Arabidopsis Hydroxyproline-Rich Glycoprotein Gene Family

8. Li, Lizhi Graphic Network based Methods in Discovering TFBS Motifs

9. Kurz, Kyle A Parallel, High-Throughput Framework for Discovery of DNA Motifs

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links