Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

De novo Population Discovery from Complex Biological Datasets

Venkatasubramanian, Meenakshi

Abstract Details

2019, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
Over the past decade, numerous clustering approaches have been developed and applied to gene expression studies for the unsupervised detection of sub-populations that inform disease prognosis, treatment and mechanism. For example, in diverse cancers, the identification of novel patient subtypes from gene expression can highlight novel therapeutic pathways and cooperating mutations. In addition to the measurement of transcriptional activity from genes, modern high-throughput sequencing technologies enable the sensitive detection of higher-resolution features including alternative splicing, RNA-editing and chromatin modifications. The detection of such features presents a number of computational challenges, due in large part to the sparse nature of that data, high dimensionality (hundreds of thousands of features) and presence of both broad and exceedingly rare molecular/genetic subtypes that are overlapping. In this dissertation, I describe the development of a series of novel methodologies to address these computational challenges that aim to uncover the hidden heterogeneity within complex molecular datasets. The first of these algorithms, splice-ICGS, provides an automated and accurate solution for the detection of complex overlapping splicing-defined subtypes, from large bulk RNA-sequencing datasets. Our solution required the introduction of several key innovations including new methods for sparse matrix filtering, correlation-based feature prioritization, iterative sparse-NMF analysis and a new strategy for multi-label classification. I demonstrate the improved performance of this approach in multiple clinical cancer datasets with an emphasis on Leukemia. To improve our understanding of the causal nature of such known and novel splicing subtypes, I further have developed several downstream analysis tools that can predict causal regulators from splicing subtypes in an automated manner (Bridger, RBP-Finder). These unsupervised approaches were further adapted to solve a distinct problem in the field of single-cell RNA-Sequencing analysis; improved unsupervised detection of common and rare cell populations from ultra-large studies of hundreds of thousands of cells. With these new algorithms in hand, the genomics research community will be presented with novel opportunities for therapeutic target identification, patient classification from splicing data and the delineation of novel cell populations in healthy tissues and disease.
Nathan Salomonis, M.D. (Committee Chair)
Gowtham Atluri, Ph.D. (Committee Member)
Raj Bhatnagar, Ph.D. (Committee Member)
Kakajan Komurov, Ph.D. (Committee Member)
Ali Minai, Ph.D. (Committee Member)
113 p.

Recommended Citations

Citations

  • Venkatasubramanian, M. (2019). De novo Population Discovery from Complex Biological Datasets [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047

    APA Style (7th edition)

  • Venkatasubramanian, Meenakshi. De novo Population Discovery from Complex Biological Datasets. 2019. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047.

    MLA Style (8th edition)

  • Venkatasubramanian, Meenakshi. "De novo Population Discovery from Complex Biological Datasets." Doctoral dissertation, University of Cincinnati, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047

    Chicago Manual of Style (17th edition)