Search Results (1 - 4 of 4 Results)

Sort By  
Sort Dir
 
Results per page  

Liu, YatingMotif Selection via a Tabu Search Solution to the Set Cover Problem
Master of Science (MS), Ohio University, 2017, Computer Science (Engineering and Technology)
Transcription factors (TFs) regulate gene expression through interaction with specific DNA regions, called transcription factor binding sites (TFBSs). Identifying TFBSs can help in understanding the mechanisms of gene regulation and the biology of human diseases. Motif discovery is the traditional method for discovering TFBSs. However, current motif discovery tools tend to generate a number of motifs that is too large to permit a biological validation. To address this problem, the motif selection problem is introduced. The aim of the motif selection problem is to select a small set of motifs from the discovered motifs, which cover a high percentage of genomic input sequences. Tabu search, a metaheuristic search method based on local search, is introduced to solve the motif selection problem. The performance of the proposed three motif selection methods, tabu-SCP, tabu-PSC and tabu-PNPSC, were evaluated by applying them to ChIP-seq data from the ENCyclopedia of DNA Elements (ENCODE) project. Motif selection was performed on 46 factor groups which include 158 human ChIP-seq data sets. The results of the three motif selection methods were compared with Greedy, enrichment method and relax integer liner programming (RILP). Tabu-PNPSC selected the smallest set of motifs with the highest overall accuracy. The average number of selected motifs was 1.37 and the average accuracy was 72.47%. Tabu-PNPSC was used to identify putative regulatory element binding sites that are in response to the overproduction of small RNAs RyfA1 in the bacteria Shigella dysenteriae. Six motifs were selected by tabu-PNPSC and the overall accuracy was 75.5%.

Committee:

Lonnie Welch (Advisor)

Subjects:

Bioinformatics; Computer Science

Keywords:

motif selection; tabu search; set cover problem

Chen, LiangMotif Selection Using Simulated Annealing Algorithm with Application to Identify Regulatory Elements
Master of Science (MS), Ohio University, 2018, Computer Science (Engineering and Technology)
Modern research on gene regulation and disorder-related pathways utilize the tools such as microarray and RNA-Seq to analyze the changes in the expression levels of large sets of genes. In silico motif discovery was performed based on the gene expression profile data, which generated a large set of candidate motifs (usually hundreds or thousands of motifs). How to pick a set of biologically meaningful motifs from the candidate motif set is a challenging biological and computational problem. As a computational problem it can be modeled as motif selection problem (MSP). Building solutions for motif selection problem will give biologists direct help in finding transcription factors (TF) that are strongly related to specific pathways and gaining insights of the relationships between genes. This study implemented an algorithm based on simulated annealing (SA) optimization algorithm for the motif selection problem, and investigated the properties of the implemented algorithm with the real world datasets (ENCODE project data). The results of evaluation based on ENCODE datasets indicate that simulated annealing algorithm is good for solving motif selection problem. The performance of simulated annealing algorithm can be tuned based on some parameters to fit for special requirements. Future improvement may be achieved via extending algorithm model (adaptive simulated annealing) and applying high dimensional cost function.

Committee:

Lonnie Welch (Advisor); Frank Drews (Committee Member); Razvan Bunescu (Committee Member)

Subjects:

Computer Science; Genetics; Molecular Biology

Keywords:

regulatory element; transcription factor; motif; motif selection; MSP; ENCODE; simulated annealing

Al-Ouran, RamiMotif Selection: Identification of Gene Regulatory Elements using Sequence Coverage Based Models and Evolutionary Algorithms
Doctor of Philosophy (PhD), Ohio University, 2015, Electrical Engineering & Computer Science (Engineering and Technology)
The accuracy of identifying transcription factor binding sites (motifs) has increased with the use of technologies such as chromatin immunoprecipitation followed by sequencing (ChIP-seq), but this accuracy remains low enough that bioinformaticians and biologists struggle in choosing the right methods for identifying such regulatory elements. Current motif discovery methods typically produce lengthy lists of putative transcription factor binding sites, and a significant challenge lies in how to mine these lists to select a manageable set of candidate sites for experimental validation. Additionally, despite the importance of covering large numbers of genomic sequences, current motif discovery methods do not consider the sequence coverage percentage. To address the aforementioned problems, the motif selection problem is introduced and solved using a coverage based model greedy algorithm and a multi-objective evolutionary algorithm. The motif selection problem aims to produce a concise list of significant motifs which is both accurate and covers a high percentage of the genomic input sequences. The proposed motif selection methods were evaluated using ChIP-seq data from the ENCyclopedia of DNA Elements (ENCODE) project. In addition, the proposed methods were used to identify putative transcription factor binding sites in two case studies: stage specific binding sites in Brugia malayi, and tissue specific binding sites in hydroxyproline-rich glycoprotein (HRGP) genes in Arabidopsis thaliana.

Committee:

Lonnie Welch (Advisor)

Subjects:

Bioinformatics; Computer Science

Keywords:

Motif selection; motif discovery; ENCODE

Kuntala, Prashant KumarOptimizing Biomarkers From an Ensemble Learning Pipeline
Master of Science (MS), Ohio University, 2017, Electrical Engineering & Computer Science (Engineering and Technology)
Understanding gene expression pattern is crucial in deciphering any observed biological phenotypes. Transcription factors (TF) are proteins that regulate genes by binding to a transcription factor binding site (TFBS) within the promoter region of a gene. Motif discovery is a computational approach that conventionally uses stochastic models, enumeration methods and many other techniques to report candidate motifs (TFBS). These methods generate similar motifs for a TF due to various reasons. Motif selection algorithms successfully identify a small set of motifs that address the specificity problem and coverage problem in motif discovery. However, these selected motifs do not always capture all the binding site preferences for a TF. This study verifies the hypothesis that motif discovery tools generate similar motifs for a transcription factor and once these variants (similar motifs) are identified, they can be used to form a super motif set, which may improve the accuracy of motif discovery. This study introduces the concept of Super motif set, a new model to accurately predict the binding sites for a TF. Two heuristic algorithms are introduced to identify Super motif sets, utilizing motif selection algorithms and a motif comparison tool. These super motif sets identified, capture the biological diversity in TFBS preferences of a TF. The algorithms are valuated on ChIP-seq data for 54 TF factor groups from the ENCODE project. Moreover, the proposed algorithms are used to optimize the motifs that are reported by motif selection algorithms and to report super motif sets in three case studies: Chagas disease, pollen specific HRGP genes in Arabidopsis thaliana and Shigellosis. On an average two motif variants are added to the selected motifs, which improve the accuracy of motif discovery.

Committee:

Frank Drews (Advisor); Lonnie Welch (Committee Chair); Jundong Liu (Committee Member); Erin Murphy (Committee Member)

Subjects:

Bioinformatics; Biology; Biomedical Research; Computer Engineering; Computer Science; Genetics; Molecular Biology

Keywords:

Motif Discovery; Motif Selection; Super Motif Set; Transcription Factor; Heuristic algorithm; DNA Motifs; Ensemble Learning; Genomics; ENCODE; Chagas disease; Shigellosis; Bioinformatics; Computational Biology;