Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Dissertation (Haitao Zhao).pdf (3.87 MB)
ETD Abstract Container
Abstract Header
Learning Genetic Networks Using Gaussian Graphical Model and Large-Scale Gene Expression Data
Author Info
Zhao, Haitao
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=akron1595682639738664
Abstract Details
Year and Degree
2020, Doctor of Philosophy, University of Akron, Integrated Bioscience.
Abstract
The Gaussian graphical model (GGM) is widely applied to learn genetic network since it defines an undirected graph decoding the conditional dependence between genes. Many algorithms based on the GGM have been proposed for learning genetic network structures. Since the number of gene variables is typically far more than the number of samples collected, and a real genetic network is typically sparse, the graphical lasso implementation of GGM becomes a popular tool for inferring the conditional interdependence among genes. In this study, based on the guidance of specific types of human cancer pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG), I extracted the genes involved in a specific KEGG pathway and the corresponding RNA-seq expression levels in cancer and normal tissues from The Cancer Genome Atlas (TCGA), and constructed two types of small gene expression datasets: normal and cancer gene expression datasets corresponding to gene sets of different types of human cancers. I directly applied graphical lasso to the gene expression datasets of the genes to infer their genetic conditional dependences. By integrated analysis and comparison on these inferred normal and cancer networks, the results reveal highly conditional dependences among the genes at the RNA-seq expression levels and further confirm the essential roles played by the genes that encode proteins involved in the two-key signaling pathways phosphoinositide 3-kinase (PI3K)/AKT/mTOR and Ras/Raf/MEK/ERK in human carcinogenesis. These highly conditional dependences elucidate the expression level interactions among the genes that are implicated in many different human cancers. The inferred genetic networks were examined to further identify and characterize a collection of gene interactions that are unique to cancers. The cross-cancer genetic interactions revealed from our study provide another set of knowledge for cancer biologists to propose strong hypotheses, so further biological investigations can be conducted effectively. It is undoubtedly that a global network that include all genes profiled in TCGA genome-wide gene expression data would be more helpful for biologists to acquire insights of the genetic interactions than subnetworks selected genes stipulated by KEGG pathways. However, graphical lasso, although showing good performance in low dimensional datasets, is computational expensive and inefficient or even unable to work directly on genome-wide gene expression datasets. In this study, inspired by the divide-and-conquer strategy as well as the Monte Carlo method, and the idea of graphical lasso, I proposed a simple but efficient method to learn the global genetic networks using graphical lasso and genome-wide RNA-seq datasets. This method utilizes Monte Carlo approach to sample subnetworks; the estimated subnetworks that are learned using graphical lasso are then integrated to approximate the global genetic network. The convergence of this Monte Carlo Gaussian graphical model (MCGGM) was evaluated with a relatively small real dataset of RNA-seq expression levels, and the results indicate its strong ability of recovering the interactions with high conditional dependences. I inferred the genetic networks by applying the MCGGM to genome-wide datasets of RNA-seq expression levels and summarized the top gene-gene interactions that indicates high interdependence. Most of top gene-gene interactions have been reported in literatures to play important roles in different human cancers, which suggests the presence of these conditional dependences in real genetic networks, and to some extent validates the abilities and reliabilities of the proposed MCGGM to identify highly conditional dependences among the genes. In conclusion, I developed a learning approach (MCGGM) that can be deployed to attain genome-wide genetic networks. The MCGGM makes use of Monte Carlo sampling technique and graphical lasso; it is efficient in terms of both computational time and space and highly scalable. The estimated genetic networks obtained from my study provide another set of knowledge for biomedical researchers to understand the expression-level interconnections among genes and hence facilitate the proposal and development of strong research questions and hypotheses.
Committee
Zhong-Hui Duan (Advisor)
Sujay Datta (Committee Member)
Qin Liu (Committee Member)
Timothy O'Neil (Committee Member)
Yingcai Xiao (Committee Member)
Pages
151 p.
Subject Headings
Bioinformatics
;
Computer Science
Keywords
Gaussian Graphical Model
;
RNA-seq Expression
;
TCGA
;
Monte Carlo Gaussian Graphical Model Algorithm
;
Genetic Networks
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Zhao, H. (2020).
Learning Genetic Networks Using Gaussian Graphical Model and Large-Scale Gene Expression Data
[Doctoral dissertation, University of Akron]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=akron1595682639738664
APA Style (7th edition)
Zhao, Haitao.
Learning Genetic Networks Using Gaussian Graphical Model and Large-Scale Gene Expression Data .
2020. University of Akron, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=akron1595682639738664.
MLA Style (8th edition)
Zhao, Haitao. "Learning Genetic Networks Using Gaussian Graphical Model and Large-Scale Gene Expression Data ." Doctoral dissertation, University of Akron, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=akron1595682639738664
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
akron1595682639738664
Download Count:
136
Copyright Info
© 2020, all rights reserved.
This open access ETD is published by University of Akron and OhioLINK.