De novo Population Discovery from Complex Biological Datasets

Venkatasubramanian, Meenakshi

Keyword Search

School Logo

34225.pdf (14.57 MB)

De novo Population Discovery from Complex Biological Datasets

Author Info

Venkatasubramanian, Meenakshi

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047

Year and Degree

2019, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.

Abstract

Over the past decade, numerous clustering approaches have been developed and applied to gene expression studies for the unsupervised detection of sub-populations that inform disease prognosis, treatment and mechanism. For example, in diverse cancers, the identification of novel patient subtypes from gene expression can highlight novel therapeutic pathways and cooperating mutations. In addition to the measurement of transcriptional activity from genes, modern high-throughput sequencing technologies enable the sensitive detection of higher-resolution features including alternative splicing, RNA-editing and chromatin modifications. The detection of such features presents a number of computational challenges, due in large part to the sparse nature of that data, high dimensionality (hundreds of thousands of features) and presence of both broad and exceedingly rare molecular/genetic subtypes that are overlapping. In this dissertation, I describe the development of a series of novel methodologies to address these computational challenges that aim to uncover the hidden heterogeneity within complex molecular datasets. The first of these algorithms, splice-ICGS, provides an automated and accurate solution for the detection of complex overlapping splicing-defined subtypes, from large bulk RNA-sequencing datasets. Our solution required the introduction of several key innovations including new methods for sparse matrix filtering, correlation-based feature prioritization, iterative sparse-NMF analysis and a new strategy for multi-label classification. I demonstrate the improved performance of this approach in multiple clinical cancer datasets with an emphasis on Leukemia. To improve our understanding of the causal nature of such known and novel splicing subtypes, I further have developed several downstream analysis tools that can predict causal regulators from splicing subtypes in an automated manner (Bridger, RBP-Finder). These unsupervised approaches were further adapted to solve a distinct problem in the field of single-cell RNA-Sequencing analysis; improved unsupervised detection of common and rare cell populations from ultra-large studies of hundreds of thousands of cells. With these new algorithms in hand, the genomics research community will be presented with novel opportunities for therapeutic target identification, patient classification from splicing data and the delineation of novel cell populations in healthy tissues and disease.

Committee

Nathan Salomonis, M.D. (Committee Chair)
Gowtham Atluri, Ph.D. (Committee Member)
Raj Bhatnagar, Ph.D. (Committee Member)
Kakajan Komurov, Ph.D. (Committee Member)
Ali Minai, Ph.D. (Committee Member)

Pages

113 p.

Subject Headings

Computer Science

Keywords

Clustering; Alternative Splicing; Bioinformatics; Non-Negative Matrix Factorization; Data Mining; Community Detection

Venkatasubramanian, M. (2019). De novo Population Discovery from Complex Biological Datasets [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047
APA Style (7th edition)
Venkatasubramanian, Meenakshi. De novo Population Discovery from Complex Biological Datasets. 2019. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047.
MLA Style (8th edition)
Venkatasubramanian, Meenakshi. "De novo Population Discovery from Complex Biological Datasets." Doctoral dissertation, University of Cincinnati, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047
Chicago Manual of Style (17th edition)

Document number:

ucin1563873297599047

Download Count:

156

Copyright Info

De novo Population Discovery from Complex Biological Datasets by Meenakshi Venkatasubramanian is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by University of Cincinnati and OhioLINK.

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

De novo Population Discovery from Complex Biological Datasets

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

De novo Population Discovery from Complex Biological Datasets

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Recommended Citations