Scalable Clustering for Immune Repertoire Sequence Analysis

Bhusal, Prem

Keyword Search

School Logo

PremBhusalThesis.pdf (2.13 MB)

Scalable Clustering for Immune Repertoire Sequence Analysis

Author Info

Bhusal, Prem

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=wright1558631347622374

Year and Degree

2019, Master of Science (MS), Wright State University, Computer Science.

Abstract

The development of the next-generation sequencing technology has enabled systems immunology researchers to conduct detailed immune repertoire analysis at the molecule level. Large sequence datasets (e.g., millions of sequences) are being collected to com- prehensively understand how the immune system of a patient evolves over different stages of disease development. A recent study has shown that the hierarchical clustering (HC) algorithm gives the best results for B-cell clones analysis - an important type of immune repertoire sequencing (IR-Seq) analysis. However, due to the inherent complexity, the classical hierarchical clustering algorithm does not scale well to large sequence datasets. Surprisingly, no algorithms have been developed to address this scalability issue for im- munology research. In this thesis, we study two different strategies, aiming at finding the best scalable methods that can preserve the quality of hierarchical clustering structure. The two strategies include (1) non-Euclidean indexing methods for speeding up the clas- sical hierarchical clustering(HC), (2) a new tree-based sequence summarization approach - SCT that scans the large sequence dataset once and generates summaries for hierarchi- cal clusters(HC). And we also experimented with the Spark based minimum-spanning-tree algorithm (SparkMST) that generates the equivalent result of single linkage hierarchical clustering (SLINK) for comparative analysis. We have implemented all these algorithms and experimented with real sequence datasets for B-cell clones analysis. The result shows that (1) the indexing-enhanced HC (e.g., us- ing the Vantage-Point tree for indexing) preserves the clustering quality very well, while also significantly reducing the time complexity of the original HC; (2) SCT with HC is the fastest approximate HC method with slightly sacrificed quality; and (3) SparkMST scales out satisfactorily and gives significant performance gain with a large Spark cluster.

Committee

Keke Chen, Ph.D. (Advisor)
Krishnaprasad Thirunarayan, Ph.D. (Committee Member)
Tanvi Banerjee, Ph.D. (Committee Member)

Pages

50 p.

Subject Headings

Computer Science

Keywords

Clustering; Immune-Repertoire; Sequence; Hierarchical Clustering

Bhusal, P. (2019). Scalable Clustering for Immune Repertoire Sequence Analysis [Master's thesis, Wright State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=wright1558631347622374
APA Style (7th edition)
Bhusal, Prem. Scalable Clustering for Immune Repertoire Sequence Analysis. 2019. Wright State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=wright1558631347622374.
MLA Style (8th edition)
Bhusal, Prem. "Scalable Clustering for Immune Repertoire Sequence Analysis." Master's thesis, Wright State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=wright1558631347622374
Chicago Manual of Style (17th edition)

Document number:

wright1558631347622374

Download Count:

324

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Scalable Clustering for Immune Repertoire Sequence Analysis

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Scalable Clustering for Immune Repertoire Sequence Analysis

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations