Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
34225.pdf (14.57 MB)
ETD Abstract Container
Abstract Header
De novo Population Discovery from Complex Biological Datasets
Author Info
Venkatasubramanian, Meenakshi
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047
Abstract Details
Year and Degree
2019, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
Abstract
Over the past decade, numerous clustering approaches have been developed and applied to gene expression studies for the unsupervised detection of sub-populations that inform disease prognosis, treatment and mechanism. For example, in diverse cancers, the identification of novel patient subtypes from gene expression can highlight novel therapeutic pathways and cooperating mutations. In addition to the measurement of transcriptional activity from genes, modern high-throughput sequencing technologies enable the sensitive detection of higher-resolution features including alternative splicing, RNA-editing and chromatin modifications. The detection of such features presents a number of computational challenges, due in large part to the sparse nature of that data, high dimensionality (hundreds of thousands of features) and presence of both broad and exceedingly rare molecular/genetic subtypes that are overlapping. In this dissertation, I describe the development of a series of novel methodologies to address these computational challenges that aim to uncover the hidden heterogeneity within complex molecular datasets. The first of these algorithms, splice-ICGS, provides an automated and accurate solution for the detection of complex overlapping splicing-defined subtypes, from large bulk RNA-sequencing datasets. Our solution required the introduction of several key innovations including new methods for sparse matrix filtering, correlation-based feature prioritization, iterative sparse-NMF analysis and a new strategy for multi-label classification. I demonstrate the improved performance of this approach in multiple clinical cancer datasets with an emphasis on Leukemia. To improve our understanding of the causal nature of such known and novel splicing subtypes, I further have developed several downstream analysis tools that can predict causal regulators from splicing subtypes in an automated manner (Bridger, RBP-Finder). These unsupervised approaches were further adapted to solve a distinct problem in the field of single-cell RNA-Sequencing analysis; improved unsupervised detection of common and rare cell populations from ultra-large studies of hundreds of thousands of cells. With these new algorithms in hand, the genomics research community will be presented with novel opportunities for therapeutic target identification, patient classification from splicing data and the delineation of novel cell populations in healthy tissues and disease.
Committee
Nathan Salomonis, M.D. (Committee Chair)
Gowtham Atluri, Ph.D. (Committee Member)
Raj Bhatnagar, Ph.D. (Committee Member)
Kakajan Komurov, Ph.D. (Committee Member)
Ali Minai, Ph.D. (Committee Member)
Pages
113 p.
Subject Headings
Computer Science
Keywords
Clustering
;
Alternative Splicing
;
Bioinformatics
;
Non-Negative Matrix Factorization
;
Data Mining
;
Community Detection
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Venkatasubramanian, M. (2019).
De novo Population Discovery from Complex Biological Datasets
[Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047
APA Style (7th edition)
Venkatasubramanian, Meenakshi.
De novo Population Discovery from Complex Biological Datasets.
2019. University of Cincinnati, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047.
MLA Style (8th edition)
Venkatasubramanian, Meenakshi. "De novo Population Discovery from Complex Biological Datasets." Doctoral dissertation, University of Cincinnati, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ucin1563873297599047
Download Count:
156
Copyright Info
© 2019, some rights reserved.
De novo Population Discovery from Complex Biological Datasets by Meenakshi Venkatasubramanian is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by University of Cincinnati and OhioLINK.