Doctor of Philosophy, The Ohio State University, 2016, Electrical and Computer Engineering
Next Generation Sequencing (NGS), the massive parallel and low-cost sequencing
technology, is able to generate an enormous size of sequencing data. This facilitates
the discovery of new genomic sequences and expands the biological and medical
research. However, these big advancements in this technology also bring big computational
challenges. In almost all NGS analysis pipelines, the most crucial and computationally
intensive tasks are sequence similarity searching and de novo genome assembly. Thus, in this
work, we introduced novel and efficient techniques to utilize the advancements in the
High Performance Computing hardware and data computing platforms in order to
accelerate these tasks while producing high quality results.
For the sequence similarity search, we have studied utilizing the massively multithreaded
architectures, such as Graphical Processing Unit (GPU), in accelerating and solving
two important problems: reads mapping and maximal exact matching. Firstly, we introduced
a new mapping tool, Masher, which processes long~(and short) reads efficiently and
accurately. Masher employs a novel indexing technique that produces an index for
huge genome, such as the human genome, with a small memory footprint such that
it could be stored and efficiently accessed in a restricted-memory device such as a GPU.
The results show that Masher is faster than state-of-the-art tools and obtains a good
accuracy and sensitivity on sequencing data with various characteristics. Secondly,
maximal exact matching problem has been studied because of its importance in
detection and evaluating the similarity between sequences. We introduced a novel
tool, GPUMEM, which efficiently utilizes GPU in building a lightweight indexing and
finding maximal exact matches inside two genome sequences. The index construction
is so fast that even by including its time, GPUMEM is faster in practice than state-of-the-art
tools that use a pre-built index (open full item for complete abstract)
Committee: Umit Catalyurek (Advisor); Kun Huang (Committee Member); Fusun Ozguner (Committee Member)
Subjects: Bioinformatics; Computer Engineering