Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
ZhaoH.the (final comments 1).pdf (1.41 MB)
ETD Abstract Container
Abstract Header
Analyzing TCGA Genomic and Expression Data Using SVM with Embedded Parameter Tuning
Author Info
Zhao, Haitao
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=akron1415629295
Abstract Details
Year and Degree
2014, Master of Science, University of Akron, Computer Science.
Abstract
The high-throughput next generation sequencing revolutionized the genomic sequencing techniques. It allows the study of thousands of genes and even the entire exome in a given organism simultaneously. This as well as other high-throughput technologies such as DNA microarray has broadened the genomic sequencing applications and changed biomedical research in a profound way. Comparing with microarray, the big data generated from next generation sequencing is considerably more reliable. As such, the technique has rapidly emerged as a major tool to obtain gene mutation and expression profiles of human cancers. The availability of these big genomic data presents unique scientific challenges and opportunities. One such challenge is to understand and characterize the patterns of genomic mutation and gene expression in different cancer types presented in the datasets. Many data mining approaches have already been developed to analyze the large datasets for feature selections and sample classifications. Since mutation and gene expression profiles are noisy due to both biological and technical variations in the data, it is clear that the effectiveness and robustness of a machine learning based classification system significantly depends upon the nature of the input data. In this study, we explore the DNA mutation and gene expression patterns in lung cancer using support vector machines with embedded parameter tuning. Two datasets used are derived from somatic mutation data and RNA-seq gene expression profiles presented in TCGA (The Cancer Genome Atlas). The embedded parameter tuning is based on data mining the training dataset using validation techniques and concepts of committee voting approach. We show that the support vector machines with tuning significantly improve the robustness and the classification accuracy when they are compared to the regular support vector machines. The approach was applied to the two datasets to explore the mutation patterns in lung adenocarcinoma between smokers and non-smokers as well as the expression patterns between lung adenocarcinoma and lung squamous cell carcinoma, two subtypes of lung cancer. Our results reveal no obvious mutation patterns between smokers and non-smokers. Since the TCGA lung adenocarcinoma dataset contain only few samples from non-smokers, it is possible the results are attributable to both the lung cancer nature and the unbalanced representation of the two different classes under consideration. On the other hand, the gene expression patterns show pronounced difference between the subtypes of lung cancer, validating the conclusion that cancer tissues of different subtypes are differentiable at the expression levels. We conclude the support vector machines with embedded parameter tuning is an effective tool for analyzing RNA-seq gene expression data.
Committee
Zhong-Hui Duan, Dr. (Advisor)
Yingcai Xiao, Dr. (Committee Member)
En Cheng, Dr. (Committee Member)
Pages
85 p.
Subject Headings
Bioinformatics
;
Computer Science
Keywords
SVM
;
TCGA
;
Gene Expression
;
RNA-seq
;
Classification
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Zhao, H. (2014).
Analyzing TCGA Genomic and Expression Data Using SVM with Embedded Parameter Tuning
[Master's thesis, University of Akron]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=akron1415629295
APA Style (7th edition)
Zhao, Haitao.
Analyzing TCGA Genomic and Expression Data Using SVM with Embedded Parameter Tuning .
2014. University of Akron, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=akron1415629295.
MLA Style (8th edition)
Zhao, Haitao. "Analyzing TCGA Genomic and Expression Data Using SVM with Embedded Parameter Tuning ." Master's thesis, University of Akron, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=akron1415629295
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
akron1415629295
Download Count:
1,342
Copyright Info
© 2014, all rights reserved.
This open access ETD is published by University of Akron and OhioLINK.