Skip navigation

Search ETDs:

More Like This | More search options

Export: Refworks Refworks | RIS

Smith-Waterman Sequence Alignment For Massively Parallel High-Performance Computing Architectures

PDF Display Full Text | Download Full Text
3.50 MB PDF file

Degree
PHD, Kent State University, College of Arts and Sciences / Department of Computer Science, .
Abstract

This research addresses one of the most often used tools in bioinformatics, sequence alignment. The increasing growth and complexity of high-performance computing as well as the stellar data growth in the bioinformatics field are motivating factors. An associative algorithm for performing quality sequence alignments more efficiently and faster is at the center of the dissertation. SWAMP or Smith-Waterman using Associative Massive Parallelism is the parallel algorithm designed and implemented for the ASC associative SIMD computing model. The theoretical parallel speedup for the algorithm is optimal, and reduces the compute time from O(mn) to O(m+n), where m and n are the length of the input sequences. When m = n, the running time becomes O(n) with a very small constant.

Using the capabilities of ASC, innovative new algorithms that are extensions of the above SWAMP algorithm increase the information returned by the alignment algorithms without decreasing the accuracy of those alignments. Known as SWAMP+, these algorithms that provide a highly sensitive parallelized approach extending traditional pairwise sequence alignment. They are useful for in-depth exploration of sequences, including research in expressed sequence tags, regulatory regions, and evolutionary relationships.

Although the SWAMP suite of algorithms was designed for the associative computing platform, they were implemented on the ClearSpeed CSX 620 parallel SIMD accelerator to obtain realistic metrics. The performance for the compute intensive matrix calculation displayed a speedup of roughly 96 using ClearSpeed's 96 processing elements, thus verifying the possibility of achieving the theoretical speedup mentioned above. Additional parallel hardware implementations were explored and a cluster-based approach to test the memory-intensive Smith-Waterman across multiple nodes within a cluster was used. This work utilized a tool called JumboMem. It allowed us to store the huge matrix of computations completely in memory for what we believe to be one of the largest instances of Smith-Waterman ever run.

Subject Headings
Bioinformatics; Computer science; Molecular biology
Keywords
Sequence Alignment; Smith-Waterman; Parallel Computing; Associative Computing; ASC; SIMD; MIMD; ClearSpeed; JumboMem; Smith-Waterman Using Associative Massive Parallelism; SWAMP; SWAMP+
Committee / Advisors
Johnnie W. Baker, PhD (Advisor)
Kenneth Batcher, PhD (Committee Member)
Paul Farrell, PhD (Committee Member)
James Blank, PhD (Committee Member)
Pages
134p.

Document number: kent1271656353
Permalink:

This ETD has been downloaded 768 times (through March 2013)