Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 128)

Mini-Tools

 
 

Search Report

  • 1. GUDIVADA, RANGA CHANDRA DISCOVERY AND PRIORITIZATION OF BIOLOGICAL ENTITIES UNDERLYING COMPLEX DISORDERS BY PHENOME-GENOME NETWORK INTEGRATION

    PhD, University of Cincinnati, 2007, Engineering : Biomedical Engineering

    An important goal for biomedical research is to elucidate causal and modifier networks of human disease. While integrative functional genomics approaches have shown success in the identification of biological modules associated with normal and disease states, a critical bottleneck is representing knowledge capable of encompassing asserted or derivable causality mechanisms. Both single gene and more complex multifactorial diseases often exhibit several phenotypes and a variety of approaches suggest that phenotypic similarity between diseases can be a reflection of shared activities of common biological modules composed of interacting or functionally related genes. Thus, analyzing the overlaps and interrelationships of clinical manifestations of a series of related diseases may provide a window into the complex biological modules that lead to a disease phenotype. In order to evaluate our hypothesis, we are developing a systematic and formal approach to extract phenotypic information present in textual form within Online Mendelian Inheritance in Man (OMIM) and Syndrome DB databases to construct a disease - clinical phenotypic feature matrix to be used by various clustering procedures to find similarity between diseases. Our objective is to demonstrate relationships detectable across a range of disease concept types modeled in UMLS to analyze the detectable clinical overlaps of several Cardiovascular Syndromes (CVS) in OMIM in order to find the associations between phenotypic clusters and the functions of underlying genes and pathways. Most of the current biomedical knowledge is spread across different databases in different formats and mining these datasets leads to large and unmanageable results. Semantic Web principles and standards provide an ideal platform to integrate such heterogeneous information and could allow the detection of implicit relations and the formulation of interesting hypotheses. We implemented a page-ranking algorithm onto Semantic Web to prioriti (open full item for complete abstract)

    Committee: Dr. Bruce Aronow (Advisor) Subjects:
  • 2. Kurz, Kyle A Parallel, High-Throughput Framework for Discovery of DNA Motifs

    Master of Science (MS), Ohio University, 2010, Computer Science (Engineering and Technology)

    The search for genomic information has just begun. New genomes are sequenced daily, and each brings new challenges and knowledge to the scientific table that must be carefully mined and studied to glean out every possible bit of information. The amount of data created during genomic sequencing is simply too great for researchers to handle, creating a need for computational tools capable of processing the genomic input and analyzing it for information. The area of bioinformatics focuses on this combination of computer science and biology, bringing useful software applications to the table in an effort to ease the workload of biologists. One specific area of interest to biological researchers is the study of DNA words or motifs as they relate to gene regulation. These regulatory elements may be transcription factor binding sites (TFBS), which bind RNA polymerase II to the DNA strand, or enhancer/silencer sequences that up- and down-regulate transcription of the gene to which they are related by binding specific proteins. Many tools such as Weeder [43], WordSpy[65] and YMF [55] are currently available for the study of over- and under-represented words in a DNA sequence, a trait which is believed to useful in identification of these regulatory elements. These tools all perform similar tasks by enumerating all words, or substrings, found in their input, then scoring and ranking these resulting words for presentation to the user. Optionally, many tools also cluster groups of words together to form degenerate motifs which allow for evolutionary and environmental variation in the binding site. The Open Word Enumeration Framework (OWEF), presented in this thesis, providesa new framework on which DNA word enumeration tools can be built. The OWEF framework provides a set of abstract base classes representing the core stages of a word enumeration tool and defines a set of standard interfaces for each stage, allowing multiple algorithmic implementations of these base classes to (open full item for complete abstract)

    Committee: Lonnie Welch PhD (Committee Chair); Frank Drews PhD (Committee Member); Chang Liu PhD (Committee Member); Robert Colvin PhD (Committee Member) Subjects: Bioinformatics; Computer Science
  • 3. Tursi, Amanda Application and Development of Computational Tools for Cytometry

    PhD, University of Cincinnati, 2024, Medicine: Biomedical Informatics

    The advent of single-cell profiling allows for large-scale cellular characterization at a detailed level that was unthinkable only two decades ago. Various omics technologies have flourished and researchers use them in conjuncture with a variety of biomedical fields. Notably, the integration of omics approaches with immunology research has been beneficial in uncovering the full complexity of the immune system. Proteomic profiling of cells has proven particularly useful in characterizing immune cell types. While different instruments and methods exist for protein detection on or within cells, a particularly popular tool in enabling single-cell profiling is flow cytometry. Although conventional flow cytometers have been used since well before the turn of the 20th century, offshoot technologies such as spectral flow cytometry and mass cytometry were formed within the last two decades. Advancements in cytometers have increased the number of measurable parameters, improved speed, and reduced costs. Consequently, cytometry is performed in an increasingly high-throughput and high-dimensional manner. This results in more complex data that require novel computational approaches to aid in processing and interpretation. The research presented here integrates programmatic approaches to advance immunological research through cell profiling via flow and mass cytometry. The viability of a computational workflow is exhibited through its usage on two separate research initiatives. One project aimed to elucidate the relationship between food allergy and vitamin D levels in a cohort of infants. The cytometry data and corresponding demographic information was also described and publicly shared to promote data reuse. The second study described here examined PFAS levels in adults and their impact on the immune system. Each study used computational methods combined with immunological knowle (open full item for complete abstract)

    Committee: Sandra Andorf Ph.D. (Committee Chair); Tamara Tilburgs Ph.D. (Committee Member); Yan Xu Ph.D. (Committee Member); Krishna Roskin PhD (Committee Member) Subjects: Bioinformatics
  • 4. Hoskins, Emily Leveraging multi-omics and big data to detect and describe rare genomic alterations in cancer that can potentially be targeted with precision therapies

    Doctor of Philosophy, The Ohio State University, 2024, Biomedical Sciences

    Cancer is a complex disease that arises from acquired mutations in normal cells, resulting in uncontrolled cell growth. The body harbors defensive mechanisms to prevent cancer cells from proliferating by repairing mistakes in DNA replication and eradicating abnormal pre-cancerous cells. However, cancer cells can acquire mutations that allow them to surpass this defensive barrier and continue to develop into a malignant disease. Since cancer develops from diverse alterations, the tumor biology of each patient is unique, facilitating a need for customizable treatments. Fortunately, genomic sequencing has enabled researchers and clinicians to identify and target the genomic alterations driving individual patients' cancers. Targeted therapy, including immunotherapy and tyrosine kinase inhibitors, have significantly improved the overall survival and quality of life of cancer patients. Established biomarkers associated with good response to a specific treatment help guide treatment. For example, established biomarkers for immune checkpoint inhibitor therapy, a type of immunotherapy, includes tumor mutational burden, microsatellite instability, and immunohistochemistry of programmed cell death ligand 1 (PD-L1). Many gene fusions and short variants involving kinase genes, including ALK, ROS1, FGFR1, FGFR2, FGFR3, RET, NTRK1, NTRK2, and NTRK3, can be targeted with clinically-approved tyrosine kinase inhibitors (TKIs). However, not all patients are eligible for targeted therapy. In this work, we strive to expand this eligibility by identifying and describing oncogenic alterations that may be clinically targetable, with a focus on specific type of genomic alteration. Structural variations, which are rare chromosomal rearrangements that can lead to carcinogenesis, include copy number variations, large deletions, tandem duplications, large insertions, and translocations, otherwise known as gene fusions. Here, leveraging large data sources, we identify and describe rare oncogeni (open full item for complete abstract)

    Committee: Sameek Roychowdhury (Advisor); Daniel Stover (Committee Member); Lianbo Yu (Committee Member); Robert Baiocchi (Committee Member); Amanda Toland (Committee Member) Subjects: Bioinformatics; Biology; Genetics; Medicine
  • 5. Yan, Ming Elucidating the Under-explored Genomic Diversity and Metabolic Potential of the Rumen Microbiome through Multi-Omics Approaches

    Doctor of Philosophy, The Ohio State University, 2024, Animal Sciences

    The rumen hosts a diverse array of prokaryotic (bacteria and archaea) and eukaryotic (fungi and protozoa) microbes. Collectively, they hydrolyze complex plant cell wall materials into simple sugars, which are further fermented into VFA, representing a substantial source of the host's energy needs. By incorporating inorganic ammonia generated from feed protein and urea, rumen microbes also provide a significant portion of the host's protein requirements. As regulators of the microbial ecosystem, rumen viruses (bacteriophages and eukaryotic viruses) also influence rumen fermentation and microbial protein synthesis. They achieve this by directly lysing microbes, thereby modulating microbial composition or by modifying the metabolism of infected bacterial cells. Additionally, they drive co-evolution between microbes and viruses, acting as vectors for horizontal gene transfer or through dynamic defense and counterdefense interactions with microbes. The anaerobic microbial cultivation techniques developed by Robert Hungate enable rumen microbiologists to explore the diverse spectrum of rumen microbial physiology and metabolism. However, despite continuous efforts in anaerobic cultivation, the culturable rumen microbes (including viruses) represent only a limited fraction of the overall diversity. Moreover, microbial cultures, whether monocultures or mixed cultures, fail to fully replicate the intricate microbial interactions observed in vivo, such as cross-feeding and predatory conditions. Fortunately, multi-omics technologies complement traditional culture-dependent analyses, enabling us to explore microbial ecology by uncovering the genomes (via genome-resolved metagenomics) and metabolism (via metatranscriptomics, metaproteomics and enzymatic activities) of the unculturable majority. Utilizing advancements in multi-omics and bioinformatics, this research aims to bridge the gap in rumen microbial genomics within the context of microbial ecology and rumen fermentation (open full item for complete abstract)

    Committee: Zhongtang Yu Dr. (Advisor); Jeffrey Firkins Dr. (Committee Member); Tansol Park Dr. (Committee Member); Chanhee Lee Dr. (Committee Member); Alejandro Relling Dr. (Committee Member) Subjects: Animal Sciences; Bioinformatics; Microbiology
  • 6. Parsons, Danielle Big data in biodiversity science: using specimen-based biodiversity data to elucidate hidden diversity

    Doctor of Philosophy, The Ohio State University, 2024, Evolution, Ecology and Organismal Biology

    While over 1 million species have been formally described, this number likely represents just 1-10% of the actual number of existing species. Because species are the basic units of biological classification, the challenges posed by this discrepancy span beyond the field of systematic biology, with implications in conservation, biodiversity, evolutionary theory, and the prevention of zoonotic disease. Our ability to effectively address these challenges is hindered by the presence of cryptic species, defined as two or more distinct species that have been misclassified as a single species due to physical similarity. Fortunately, while the common notion of species discovery usually involves expeditions to remote regions, many undescribed cryptic species are already present in natural history collections. These collections provide critical data for the biological sciences by housing not only physical specimens, but also a wide variety of associated metadata (e.g., locality, life history traits, genetic sequences, etc.). My dissertation work will facilitate species discovery by providing a framework through which this information can be utilized to answer longstanding questions about cryptic diversity. In Chapter 2, I use publicly available genetic data to estimate levels of cryptic diversity in mammals (order: Mammalia) and develop a machine learning model using trait data to identify characteristics that make certain mammals more likely to harbor cryptic species. I find that small-bodied taxa with large, climatically variable ranges are most likely to contain cryptic diversity. In Chapter 3, I apply this framework to salamanders (clade: Caudata), a group which differs from mammals in several key aspects, including species richness and sampling intensity. While I was not able to pinpoint a specific predictor variable as the most important for predicting undescribed diversity, results of the machine learning model suggest that cryptic diversity in salamanders is likel (open full item for complete abstract)

    Committee: Bryan Carstens (Advisor); Ryan Norris (Committee Member); Andreas Chavez (Committee Member) Subjects: Bioinformatics; Biology; Genetics; Molecular Biology; Museum Studies; Organismal Biology; Zoology
  • 7. Powell, Joseph Integration of Digital Health Resources for Deep Phenotypic Remote Monitoring of Patient Health

    Doctor of Philosophy, Case Western Reserve University, 2024, Systems Biology and Bioinformatics

    The rapid advancement of personal wearable devices has allowed for the inception of novel applications of deep phenotyping for characterization of disease. The need to advance deep phenotyping and analysis methods for personalized wearable devices is crucial to the advancement of personalized remote patient monitoring. We developed an end-to-end digital health infrastructure designed for fast, secure, and effective patient recruitment, data collection, and analysis reporting. We analyzed the efficacy of patient recruitment through our end-to-end patient interface and found that recruitment methods from traditional means such as through clinical sources and university sources resulted in more consents ([0.015, 0.030]; p << 0.001) and more active patients initially (2 = 23.65; p < 0.005). Additionally, we noted that online recruitment through Facebook advertising and Google advertising produced a more ethnically diverse population compared to regional clinical recruitment (2 = 231.47; p < 0.001). We investigated the use of the previously reported NightSignal algorithm, originally developed for SARS-CoV-2 detection, on the detection of abnormal resting heart rate observations for cardiothoracic surgical patients collected through our infrastructure. We found The NightSignal algorithm had a sensitivity of 81%, a specificity of 75%, a negative predictive value of 97%, and a positive predictive value of 28% for the detection of postoperative events. When compared to patients who did not experience a postoperative event, patients who did experience a postoperative event had a significantly higher proportion of red alerts issued by the NightSignal algorithm during the first 30 days after surgery (0.325 vs. 0.063; p<0.05)]. Finally, we then investigated the potential for latent subgroup identification using physiological parameters generated from personal wearable devices. We found latent subgroups at 30-days, 60-days, and 90-days post-operatively. Each latent group was we (open full item for complete abstract)

    Committee: Mark Cameron (Committee Chair); Jing Li (Committee Member); Wai Hong Wilson Tang (Committee Member); Xiao Li (Advisor) Subjects: Bioinformatics; Biomedical Research
  • 8. Zhao, Ziyin Deciphering Transcriptomic Signatures in Alzheimer's Disease CSF Leukocytes through Single-Cell Sequencing Analysis

    Master of Sciences, Case Western Reserve University, 2024, Systems Biology and Bioinformatics

    Alzheimer's disease (AD) is the most common neurodegenerative disease and the leading cause of dementia. Cerebrospinal fluid (CSF) is a neuroprotector fluid that carries brain metabolites away from the blood-brain barrier. It is an optimal sample for studying neuroinflammation in central nervous system diseases. However, the role of cells carried in CSF in remains underexplored. In this thesis, we investigated the single-cell RNA sequencing data of leukocytes in CSF. The ratio of CD11B+ cells versus T cells increased in amyloid-healthy individuals and gradually decreased with AD progression. Differential expression analysis of the same leukocyte subtype in different AD stages showed that CCL3 and its variants are up-regulated in monocytes from MCI to AD. IL1B is down-regulated in IM and NCM in MCI patients vs healthy individuals. Pathways enrichment analysis shows that interferon-gamma response, interferon-alpha response, and allograft rejection pathways are up-regulated through AD progress in most cell types.

    Committee: Gurkan Bebek (Committee Chair); Cheryl Cameron (Committee Member); Jagan Pillai (Committee Member) Subjects: Bioinformatics; Biomedical Research; Immunology; Neurobiology
  • 9. Salyer, Owen A multi-cancer study of CpG island methylator phenotype and its complex set of mutational associations

    Bachelor of Science (BS), Ohio University, 2024, Computer Science

    CpG sites occur at an irregularly high frequency at regions of the genome called CpG islands; in normal samples, CpG islands are typically unmethylated. When a number of these CpG islands are instead hypermethylated in a cancer sample the sample is referred to as possessing CpG island methylator phenotype (CIMP). CIMP has been well-defined (and was first reported) in colorectal cancers, but research has also been performed on numerous cancer types such as gliomas, melanomas, and leukemia. The goal of the work presented here is to analyze the differences in gene mutation and expression between samples with CIMP (CIMP+) and samples without CIMP (CIMP-) across multiple cancer types. First we perform a coverage analysis to find the sets of genes and mutations which best correlate with CIMP+ samples while minimizing correlation with CIMP- samples. The cancer types used in these analyses are colorectal cancer (COADREAD), stomach cancer (STAD), and uterine corpus endometrial carcinoma (UCEC). The results from each tumor type are combined to create a synthesized set of mutations and mutated genes which correlate with CIMP+ samples. We then perform differential expression analysis on RNA-Seq data from The Cancer Genome Atlas (TCGA) to determine genes that are differentially expressed in CIMP+ samples when compared to CIMP- samples. We find 890 mutations and mutated genes with strong positive correlation to CIMP, many of which are also differentially expressed between CIMP+ and CIMP- samples. Among these mutated and differentially expressed genes are numerous genes in the MSigDB KRAS down-signaling gene set. This analysis furthers our understanding of CIMP and cancer in general and may give more insight into the differences between the genomic characteristics of CIMP-positive and CIMP-negative cancers.

    Committee: Lonnie R. Welch (Advisor) Subjects: Computer Science
  • 10. Li, Minghua Prediction of Long Non-Coding RNAs and Their Functions in Plant Immune Response

    Doctor of Philosophy, Miami University, 2024, Cell, Molecular and Structural Biology (CMSB)

    Long non-coding RNAs (lncRNAs) play critical roles in diverse biological processes. The extensive availability of public RNA-Seq data offers valuable resources for identifying novel lncRNAs. Here, we introduce LncDC (Long non-coding RNA detection), a machine learning-based tool designed to detect lncRNAs from RNA-Seq data. LncDC utilizes an XGBoost model incorporating features derived from primary sequences, secondary structures, and translated proteins to differentiate between lncRNAs and mRNAs. Notably, sequence and secondary structure k-mer score features, along with various open reading frame-related features, contribute to the classification of lncRNAs and mRNAs. Benchmarking experiments have shown that LncDC surpasses six state-of-the-art tools in several performance metrics. Applying LncDC to 180 RNA-Seq datasets from osteosarcoma patients led to the discovery of 97 novel osteosarcoma-specific lncRNAs. Additionally, the role of lncRNA in Oryza sativa RNase P protein 30 (OsRpp30)-mediated disease resistance in rice remains largely unexplored. OsRpp30 is known as a positive regulator of rice immunity against various pathogens. To further understand this mechanism, we conducted RNA-Seq and small RNA-Seq profiling of lncRNAs, miRNAs, and mRNAs in wild type, OsRpp30 overexpression, and OsRpp30 knockout rice plants. Our comprehensive transcriptome analysis identified 91 differentially expressed lncRNAs, 1671 differentially expressed mRNAs, and 41 differentially expressed miRNAs across these rice lines. We also explored interactions between differentially expressed lncRNAs and mRNAs, uncovering 10 trans- and 27 cis-targeting pairs specific to the OsRpp30 overexpression and knockout conditions. Furthermore, we constructed a competing endogenous RNA network comprising differentially expressed lncRNAs, miRNAs, and mRNAs to elucidate their interactions in rice immunity. Our findings reveal that lncRNAs participate in OsRpp30-mediated disease resistance in rice by regula (open full item for complete abstract)

    Committee: Chun Liang (Advisor); Haifei Shi (Committee Chair); Philippe Giabbanelli (Committee Member); Richard Moore (Committee Member); Tereza Jezkova (Committee Member) Subjects: Bioinformatics; Biology
  • 11. Macke, Amanda All About Allostery: A study of AAA nanomachines responsible for microtubule severing using molecular modelling, bioinformatics, and machine learning

    PhD, University of Cincinnati, 2024, Arts and Sciences: Chemistry

    The cytoskeleton, a key feature of the cell, acts as scaffolding that is responsible for maintaining the cell shape as well as forming a highway system for intra-cellular transportation. Thus, the cell must maintain strict regulation of its cytoskeleton to undergo deliberate change. Microtubules, an essential biopolymer of the cytoskeleton, are routinely severed by specific AAA (ATPases Associated with cellular Activities) nanomachines. Severing is required for a variety of significant cellular functions including, but not limited to, cellular division and neurogenesis. Changes to microtubules themselves, their various regulatory processes, and these proteins would have far reaching, serious implications on the viability and health of the cell and its organism. The microtubule severing enzymes are katanin, spastin, and fidgetin. Recent structural studies have solved hexameric structures for katanin and spastin in the presence of cofactors indicating they operate via a global conformational change induced by ATP hydrolysis. Simulations were previously used to study the functional states of both severing enzymes where it was identified that in long time-scales, at least one conformation will disassemble in the absence of cofactors. To further understand this observed disassembly process and the influence of the cofactors, a similar study of the resulting lower order oligomers was designed in part one. Through machine learning and in-house developed analyses, we recognized significant allosteric shifts due to the presence of ligands and neighboring protomers. During this study we also identified a particular region of katanin that is highly correlated with ligand binding from the helical bundle domain (HBD). We developed StELa, an in-house clustering algorithm, to characterize observed structural changes from simulation which identified a specific local conformational change due to ligand binding. In part two, this method was compared with other available algor (open full item for complete abstract)

    Committee: Ruxandra Dima Ph.D. (Committee Chair); Ryan White Ph.D. (Committee Member); Anna Gudmundsdottir Ph.D. (Committee Member) Subjects: Chemistry
  • 12. Eicher, Tara We're All in This Together: Learning Interpretable Models of Associations Between Multi-Omics Data

    Doctor of Philosophy, The Ohio State University, 2023, Computer Science and Engineering

    In many biomedical contexts, multiple types of BDMs (e.g., metabolites, genes, proteins, chromatin states, and DNA methylation sites) associate with one another directly or indirectly in groups or chains to impact phenotype or outcome. Certain significant associations often help in data interpretation and novel hypotheses generation, motivating researchers to identify the most impactful groups of BDM associations between multiple types of data. However, many state-of-the-art models focus either on individual BDM associations independently of one another or implement black box predictors of outcome that are agnostic of BDM associations. Moreover, collection of multiple types of BDMs in a subject (i.e., multi-omics data) is not always feasible, motivating the need to infer one omic type of data from another. This dissertation tackles the related problems of (1) using inter-omics approaches to infer BDM types from other related BDM types in specific contexts, (2) finding groups of multi-omics data BDMs associated with outcome through multivariate statistical analysis and graph-based predictive models, and (3) interpreting groups of multi-omics data BDMs associated with outcome in a functional context using existing knowledge. This dissertation addresses the problem of using inter-omics approaches to infer BDM types from other related BDM types in two domains of note: (1) regulatory element annotation, and (2) protein abundance prediction. First, this dissertation introduces the Self Organizing Map with Variable Neighborhoods (SOM-VN), designed to annotate regulatory elements across whole human genomes using shapes found in chromatin accessibility assays. The novelty of SOM-VN is that, while most computational tools for annotating regulatory elements require a suite of resource-intensive experimental assays, SOM-VN uses only a single assay to annotate regulatory elements. SOM-VN is validated on chromatin accessibility assays from multiple H1, HeLa, A549, and GM12878 ce (open full item for complete abstract)

    Committee: Raghu Machiraju (Advisor); Ewy Mathé (Advisor); Andrew Perrault (Committee Member); Rachel Kopec (Committee Member); Rachel Kelly (Committee Member) Subjects: Applied Mathematics; Artificial Intelligence; Bioinformatics; Biomedical Research; Biostatistics; Computer Science
  • 13. Ghandikota, Sudhir Novel representation learning methodologies for consensus module detection, candidate gene prioritization, and biomarker discovery.

    PhD, University of Cincinnati, 2023, Engineering and Applied Science: Computer Science and Engineering

    Graphs have become a convenient approach for representing complex real-world systems that contain a collection of objects and their relationships. They are extensively used to model data in various domains, including computer science, statistical physics, linguistics, and biological and social sciences. For instance, in the biological domain, networks are used to represent the interactions between proteins. Traditional network clustering and community detection algorithms are then applied for candidate gene prioritization and in silico biomarker discovery. However, given the ever-increasing size and complexity of networks, machine learning has become the primary approach for analyzing such graphs. The success of these models is highly dependent on the quality of user-designed input features. Alternatively, representation learning models work toward learning relevant representations of input data suitable for the task at hand. The learned representations can then be reused in subsequent downstream tasks as inputs. In addition, they can be used to determine the explanatory factors shared by two or more independent learning tasks. Recently, there has been a surge in representation learning frameworks for graph-structured data to learn node embeddings. However, computational frameworks capable of analyzing multiple networks simultaneously are still limited. Such implementations are particularly useful for research problems, such as in silico biomarker discovery, where multiple transcriptomic studies associated with a given disease are available but seldom used. In this dissertation, we developed novel feature learning frameworks capable of embedding network nodes from multiple datasets. In the first part of our work, we developed a skip-gram-based multi-task feature learning model that is capable of combining multiple supervised and/or unsupervised task objectives to learn continuous features of discrete entities. We used this model to extract contextualized gene (open full item for complete abstract)

    Committee: Anil Jegga DVM MRes (Committee Chair); Raj Bhatnagar Ph.D. (Committee Member); Ali Minai Ph.D. (Committee Member); Yizong Cheng Ph.D. (Committee Member); Jing Chen Ph.D. (Committee Member) Subjects: Computer Science
  • 14. Green, Ryan Applying Deep Learning Techniques to Assist Bioinformatics Researchers in Analysis Pipeline Composition

    MS, University of Cincinnati, 2023, Engineering and Applied Science: Computer Science

    In this thesis, I address the problem of computational tool recommendation to suggest during the construction of life science analysis workflows. The major motivation for such a system is to mitigate the time required by bioinformaticians in researching and selecting tools to complete an analysis. Constructing workflows is a time-consuming process that requires many careful decisions and extensive domain knowledge. The recent, rapid expansion of Bioinformatics research has led to new tools appearing daily that further perplexes the tool selection process. A great source of information for learning and constructing new analyses is to consult existing ones. A system that can learn latent connections between tools from existing workflows and use them to suggest downstream tools or tool sequences in a new workflow-in-progress should be highly valuable to researchers in the creation process. The Bioinformatics Tool Recommendation system (BTR) is proposed to accomplish this task. BTR is a deep learning architecture that makes use of emergent graph neural network technology to find the most relevant successive tools for an input workflow query. Workflow construction is framed as a session-based recommendation problem and relevant techniques are applied. The method leverages a novel approach in representing workflows as directed acyclic graphs, rather than linear tool sequences, that sees benefits in recommendation performance and logical function. An attention mechanism is used to highlight recent workflow context and drop low-relevance tools. Semantic tool descriptions are mined and incorporated using a domain-specific language processing approach. Experiments show a significant improvement over the closest-related previous work for the automatic evaluation metrics.

    Committee: Tingting Yu Ph.D. (Committee Chair); Nan Niu Ph.D. (Committee Member); Jinze Liu PhD (Committee Member); Raj Bhatnagar Ph.D. (Committee Member) Subjects: Computer Science
  • 15. Tallman, David From conversations to copy numbers: Bioinformatic approaches to analyzing cancer patient data

    Doctor of Philosophy, The Ohio State University, 2023, Molecular, Cellular and Developmental Biology

    The number of cancer diagnoses worldwide is on the rise as populations continues to grow older. In the US, the amount of money allocated to cancer research by the National Cancer Institute increases yearly. With increasing focus towards cancer research, it is important researchers maintain perspective and to ensure that these resources are utilized efficiently. The research mission of the Stover Lab is to improve the outcomes of patients with cancer. We keep the patients in mind during the entire research process, from project conception to publication. In this dissertation, three distinct research projects undertaken during my PhD are summarized. In Chapter 2, we investigated the survivorship needs of patients with gynecological cancers. By extracting posts made on the American Cancer Society's Cancer Survivorship forums, we discovered some of the needs of cancer patients by looking at their posted conversations and concerns. We developed an analysis methodology to allow post extraction that pertain to custom themes. We showed its utility by extracting and qualitatively analyzing posts that pertain to the psychosocial aspects of survivorship. In Chapter 3, a novel image analysis-based algorithms were developed to investigate the patterns of expression of HER2 in breast cancer patients. Current treatment strategy for breast cancer is reliant on determining whether a patient is HER2 positive using a clinical immunohistochemistry stain for HER2. The criteria used by pathologists for this test is simplistic, in that it only looks at a proportion of intensely stained cells and uses a single cutoff to define a patient as HER2 positive or negative. We believe there is an opportunity to gather more information from these IHC stains and use this information to further delineate breast cancer patients based on their HER2 expression, better predicting patient outcomes. We showed a new method that quantifies the heterogeneity of HER2 expression and significantly predicted recu (open full item for complete abstract)

    Committee: Daniel Stover (Advisor); Ramesh Ganju (Committee Member); Raghu Machiraju (Committee Member); Anne Strohecker (Committee Member) Subjects: Molecular Biology
  • 16. Ayoub, Christopher The Gene Expression Landscape of Alzheimer's Disease Tauopathy and Selective Vulnerability

    Doctor of Philosophy, The Ohio State University, 2023, Biomedical Sciences

    Alzheimer's Disease (AD) is a debilitating neurodegenerative disorder characterized by the progressive and selective accumulation of neurofibrillary tangles in specific areas of the brain over the course of disease. Composed of aggregated tau protein, these tangles appear to spread from the earliest affected regions to networked brain regions across the synapse, templating additional pathology in a prion-like manner. However, the cerebellum appears to resist this prion-like insult, despite connectivity to early and profoundly affected regions. The selective vulnerability and resistance of specific brain regions and cell types to prion-like tau pathology offers a window into disease etiology and endogenous mechanisms of neuroprotection. The objective of this work was to untangle the adaptive changes to disease that respond in parallel and in contrast between differentially vulnerable tissues to provide new insight into disease etiology and new targets for biological validation in disease models. First, we define a unique gene expression approach termed Ratio of Ratios that tests differential gene expression across AD and control in the vulnerable prefrontal cortex and the resistant cerebellum. We apply this along with Desirability Function Analysis to a publicly available microarray data set to sort genes into priority groups demonstrating contrasting differential expression between regions that associates with selective vulnerability, and parallel differential expression between regions that is nonspecific to vulnerability. Among contrasting genes, we find a neuronal and endothelial proteostasis signature where chaperones are selectively upregulated in the cerebellum. Among parallel genes, we find a microglial, astrocytic, and endothelial signature of immune and stress activation. Using transcription factor interaction network analysis, we report potential key regulators of these contrasting and parallel responses. We also show that the identified chaperone p (open full item for complete abstract)

    Committee: Jeffrey Kuret (Advisor); Karl Obrietan (Committee Member); Andrea Tedeschi (Committee Member); Hongjun Fu (Committee Member) Subjects: Bioinformatics; Biology; Biomedical Research; Neurosciences
  • 17. Nadwodney, Martin Understanding the Pathogenic Nature of L359V Variant of GATA-2 with Respect to Chronic Myeloid Leukemia.

    Bachelor of Science, Walsh University, 2022, Honors

    The way to cure a given illness is to understand the genetic components which contributed to its development within a given patient. Using the Xavier Method, a pie chart was created on Microsoft Excel which specified all the pathogenic proteins associated with Chronic Myeloid Leukemia (CML), including GATA-2, a transcriptional factor which upregulates cell proliferation while downregulating cell differentiation. Using the amino acid sequence of GATA-2 along with bioinformatic programs, the structure of the GATA-2 variant was developed and analyzed for conserved scores. This structural information revealed much about GATA-2's ability to bind to DNA, other transcriptional factors, and contribute to CML. It is hoped that other students will pick up from this research to investigate the other proteins associated with CML through a similar bioinformatics approach.

    Committee: Thomas Freeland (Advisor); Jennifer Clevinger (Committee Co-Chair); Nina Rytwinski (Committee Co-Chair); Adam Underwood (Advisor) Subjects: Biochemistry; Bioinformatics; Computer Science; Oncology
  • 18. Cazares, Tareian Predicting Transcription Factor Binding in Humans with Context-Specific Chromatin Accessibility Profiles Using Deep Learning

    PhD, University of Cincinnati, 2022, Medicine: Immunology

    Each human genome shares approximately 99.9% of the same DNA sequence. The ~2-5 million sites that differ between each human contribute to a large array of phenotypic variation observed between each individual. Even more astonishing is the fact that all cells in an individual share the same genetic code but give rise to a multitude of different cell types. This incredible feat is possible due to the complex networks of genes and regulatory proteins that make up the logic of gene expression. Proteins known as transcription factors (TFs) play an active role in binding DNA to regulate gene expression. TFs are essential in controlling and establishing the networks of expressed genes making up different cell states across all eukaryotic organisms. In humans, most disease-associated genetic variants fall outside of protein-coding DNA and are often enriched in regulatory elements associated with DNA binding proteins such as TFs. Knowledge about TF binding helps improve our knowledge of how gene expression is regulated, and potential mechanisms contributing to disease in humans. Computational methods are largely used to predict TF binding sites (TFBS) as the experimental characterization of most human TFs is intractable due to technical limitations. Instead, the most popular approaches use a TFs known DNA binding preferences, a TF motif, to look for matches in the genome. However, TF motifs are 5-30 bp long and occur frequently across the 3.2 billion bases of the human genome. TFBS predictions can be improved by using information about which areas of the genome are accessible and actively poised for TF binding. These areas of accessible chromatin are often indicative of genomic regions that are actively transcribed and connected to the current cell-state. Experimental methods, such as the assay for transposase accessible chromatin (ATAC-seq) have been developed to measure which areas of the genome are open, and thus primed for TF binding. This dissertation describes (open full item for complete abstract)

    Committee: Artem Barski Ph.D. (Committee Member); Leah Claire Kottyan Ph.D (Committee Member); Matthew Weirauch Ph.D. (Committee Member); Stephen Waggoner Ph.D. (Committee Member); Emily Miraldi Ph.D. (Committee Member) Subjects: Genetics
  • 19. Blackburn, Jessica Integrative Approaches to Evaluate Gliosis in Pediatric Neuropathology

    Doctor of Philosophy, The Ohio State University, 2022, Anatomy

    Machine learning is a popular tool commonly used to improve diagnostic criteria, clinical decision making, and patient outcomes in leading causes of death, such as, cancer, heart disease, and stroke. Through the integration of epidemiological, genomic, radiologic, and histological workflows, clinicians and researchers can understand complex diseases. However, not all aspects of biomedical research have implemented machine learning modalities. Research focused on neonatal pathology and neurodevelopment still utilize human-observer based and/or traditional computational approaches. In this dissertation, I have applied computationally advanced machine learning models to a variety of data sources to evaluate gliosis within neonatal pathology. I have modified and applied, for the first time, workflows previously used in biomedical informatics across three aims. Aim 1 is focused on clustering of maternal and infantile characteristics of sudden infant death syndrome (SIDS) decedents to identify unique subgroups with potential neurodevelopmental differences. In Aim 2, I developed an immunofluorescence image analysis segmentation workflow to elucidate the region-specific morphological changes astrocytes undergo following systemic inflammation. In Aim 3, I expand on Aim 2's work and developed an image analysis workflow by extracting pixel intensity, pixel texture, and intracellular Ca++ transient propagation imaging features from 50 GBs of time series astrocyte calcium imaging data. With these features, I objectively identify and quantify waveform morphology of reactive astrocytes to evaluate by multivariate analyses the heterogeneous response to inflammation and hypoxia. Periods of hypoxia, sleep apnea, and inflammation are reported to induce brainstem gliosis, a known forensic finding in SIDS decedents; the impact of reactive astrogliosis on astrocyte cell function has eluded us due to the lack of high throughput and unbiased data analysis workflows. The purpose of this stu (open full item for complete abstract)

    Committee: José Otero (Advisor); James Cray Jr. (Advisor); Christopher Pierson (Committee Member); Melissa Quinn (Committee Member) Subjects: Anatomy and Physiology; Biomedical Research; Pathology
  • 20. Reddy, Vineet Single Cell Transcriptomic-informed Microcircuit Computer Modelling of Temporal Lobe Epilepsy

    Master of Science in Biomedical Sciences (MSBS), University of Toledo, 2022, Biomedical Sciences (Bioinformatics and Proteomics/Genomics)

    Temporal Lobe Epilepsy (TLE) is one of the most common neurological disorders and is characterized by recurrent and spontaneous seizures. Although TLE genetic and electrophysiological markers such as gamma oscillations are well characterized, alterations in the interactions between neurons predisposing a cortical region to seizures are not fully understood. To study these non-linear interactions, we incorporated RNA expression changes into a microcircuit computer model of the hippocampus, an area strongly implicated in TLE. Cellular deconvolution of bulk RNAseq data with single-cell transcriptomic data from the hippocampi of pilocarpine-induced temporal lobe epilepsy mice revealed three distinct cell clusters characterized as pyramidal (PYR) cells, oriens-lacunosum moleculare (OLM) interneurons, and parvalbumin-positive (PV) interneurons. We used the differential expression (log fold change) of genes coding for the Alpha-Amino-3-Hydroxy-5-Methyl-4-Isoxazole Propionic Acid (AMPA), N-methyl-D-aspartate (NMDA), and Gamma-aminobutyric acid type A (GABAA) receptor subunits in the control and epileptic conditions for each cell cluster to guide scaling of receptor density iv in the model. The model was composed of 800 PYR, 200 PV and 200 OLM neurons. PYR cells of the model activate PV, OLM, and other pyramidal cells via NMDA and AMPA receptors; in return, the PV and OLM interneurons inhibit PYR cells by acting on their GABAA receptors. Guided by the RNA expression data, we ran simulations where we increased the density of PYR AMPAR, OLM NMDAR, PV AMPAR, and PV GABAAR scaling. PYR GABAAR subunits were both upregulated and downregulated and thus, both changes were implemented when running simulations. Our simulations showed two dynamical changes with the RNA sequence changes. The first is the expected increased seizure susceptibility, reflected as increased gamma power. That pattern took place with pyramidal AMPAR/GABAAR upscaling. The second pattern was a surprising reduc (open full item for complete abstract)

    Committee: Robert Mccullumsmith (Advisor); Rammohan Shukla (Committee Co-Chair); Mohamed Sherif (Committee Member); Bruce Bamber (Committee Member); Imran Ali (Committee Member) Subjects: Bioinformatics; Biophysics; Neurosciences