Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

We're All in This Together: Learning Interpretable Models of Associations Between Multi-Omics Data

Abstract Details

2023, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
In many biomedical contexts, multiple types of BDMs (e.g., metabolites, genes, proteins, chromatin states, and DNA methylation sites) associate with one another directly or indirectly in groups or chains to impact phenotype or outcome. Certain significant associations often help in data interpretation and novel hypotheses generation, motivating researchers to identify the most impactful groups of BDM associations between multiple types of data. However, many state-of-the-art models focus either on individual BDM associations independently of one another or implement black box predictors of outcome that are agnostic of BDM associations. Moreover, collection of multiple types of BDMs in a subject (i.e., multi-omics data) is not always feasible, motivating the need to infer one omic type of data from another. This dissertation tackles the related problems of (1) using inter-omics approaches to infer BDM types from other related BDM types in specific contexts, (2) finding groups of multi-omics data BDMs associated with outcome through multivariate statistical analysis and graph-based predictive models, and (3) interpreting groups of multi-omics data BDMs associated with outcome in a functional context using existing knowledge. This dissertation addresses the problem of using inter-omics approaches to infer BDM types from other related BDM types in two domains of note: (1) regulatory element annotation, and (2) protein abundance prediction. First, this dissertation introduces the Self Organizing Map with Variable Neighborhoods (SOM-VN), designed to annotate regulatory elements across whole human genomes using shapes found in chromatin accessibility assays. The novelty of SOM-VN is that, while most computational tools for annotating regulatory elements require a suite of resource-intensive experimental assays, SOM-VN uses only a single assay to annotate regulatory elements. SOM-VN is validated on chromatin accessibility assays from multiple H1, HeLa, A549, and GM12878 cell lines and on B-cells, heart, stomach, and brain tissue. Next, multiple methods for predicting protein abundance from messenger RNA (mRNA) transcripts in the genome are evaluated and combined in the context of breast and ovarian cancer cells. Next, this dissertation addresses the problem of identifying the BDM associations most predictive of outcome, where BDM associations are between multiple types of data. This dissertation first introduces IntLIM 2.0, a software package that builds many linear models of outcome-dependent BDM associations in parallel, generating a set of multi-omics BDM associations that can be filtered, statistically validated, and visualized. IntLIM is applied to two separate pediatric asthma cohorts to find associations between gene expression levels and metabolite abundance levels that are dependent on and/or predictive of Immunoglobulin E (IgE) levels in blood serum. This dissertation then introduces the Graph Ensemble Neural Network (GENN), a predictive ensemble of said linear models constructed by optimally consolidating significant linear BDM association models and rearranging the consolidated model to predict outcome. The key innovations of GENN are the use of pooling and pruning across multiple levels of the graph to consolidate associations and the use of metafeatures to reduce the model parameter space. GENN is validated on simulated data, the asthma cohorts, and a multi-omic pan-cancer data set where the outcome is chemotherapeutic drug response. GENN is then applied to find associations between gene expression levels and DNA methylation levels in gliomas that are dependent on and/or predictive of survival group. Finally, this dissertation introduces a novel method for multi-omics biological pathway enrichment analysis, the Multi-Omics Knowledge Graph (MOKnG), which addresses key challenges in state-of-the-art multi-omics pathway analysis methods. MOKnG analysis is performed on a novel type of weighted multi-omics graph where BDMs are edges, and edge weights estimate common BDM functionality using novel metric. Using graph traversal techniques and an input set of BDMs of interest, MOKnG analysis finds enriched modules of the graph and groups of pathways represented by each module. It is illustrated that (1) MOKnG finds concordant pathways enriched across disparate COVID-19 metabolomics cohorts, whereas state-of-the-art multi-omics pathway analysis methods do not, (2) MOKnG enriched pathways include a higher percentage of non-random pathways than a baseline technique, and (3) MOKnG recovers enriched pathways in simulated data more accurately than a baseline technique in the presence of missing BDMs. Overall, the methods introduced in this dissertation (i.e., SOM-VN, the proteogenomics ensemble, IntLIM 2.0, GENN, and MOKnG) advance the field of multi-omics integration in multiple respects by leveraging different types of associations between multi-omics BDMs (i.e., associations learned from predictive models, linear relationships, and relationships from existing knowledge). These advancements span the range of inter-omics predictive methods, methods for predicting outcome from multi-omics data, and methods for functionally interpreting multi-omics data.
Raghu Machiraju (Advisor)
Ewy Mathé (Advisor)
Andrew Perrault (Committee Member)
Rachel Kopec (Committee Member)
Rachel Kelly (Committee Member)

Recommended Citations

Citations

  • Eicher, T. (2023). We're All in This Together: Learning Interpretable Models of Associations Between Multi-Omics Data [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1701305460789336

    APA Style (7th edition)

  • Eicher, Tara. We're All in This Together: Learning Interpretable Models of Associations Between Multi-Omics Data. 2023. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1701305460789336.

    MLA Style (8th edition)

  • Eicher, Tara. "We're All in This Together: Learning Interpretable Models of Associations Between Multi-Omics Data." Doctoral dissertation, Ohio State University, 2023. http://rave.ohiolink.edu/etdc/view?acc_num=osu1701305460789336

    Chicago Manual of Style (17th edition)