Doctor of Philosophy, The Ohio State University, 2019, Computer Science and Engineering
Sparse matrix/tensor operations have been a common computational motif in
a wide spectrum of domains - numerical linear algebra, graph analytics, machine
learning, health-care, etc. Sparse kernels play a key role in numerous machine learning
algorithms and the rising popularity of this domain increases the significance of
the primitives like SpMV (Sparse Matrix-Vector Multiplication), SDDMM (Sampled
Dense-Dense Matrix Multiplication), MF/TF(Sparse Matrix/Tensor Factorization),
etc. These primitives are data-parallel and highly suitable for GPU-like architectures
that provide massive parallelism. Real-world matrices and tensors are large-scale and
have millions of data points, which is sufficient to utilize all the cores of a GPU. Yet,
a data-parallel algorithm can become the bottleneck of an application and perform
way below than the upper bound of the roofline model. Some common reasons are
frequent irregular global memory access, low data reuse, and imbalanced work distribution.
However, efficient utilization of GPU memory hierarchy, reduced thread
communication, increased data locality, and an even workload distribution can provide
ample opportunities for significant performance improvement. The challenge
lies in utilizing the techniques across applications and achieve an even performance
in spite of the irregularity of the input matrices or tensors. In this work, we systematically
identify the performance bottlenecks of the important sparse algorithms and
provide optimized and high performing solutions.
At the beginning of this dissertation, we explore the application of cost-effective
ML techniques in solving the format selection and performance modeling problem in
the SpMV domain. By identifying a small set of sparse matrix features to use in
training the ML models, we are able to select the best storage format and predict
the execution time of an SpMV kernel as well. Next, we optimize the SDDMM kernel,
which is a key bottleneck in fa (open full item for complete abstract)
Committee: P. (Saday) Sadayappan (Advisor); Atanas Rountev (Committee Member); Radu Teodorescu (Committee Member)
Subjects: Computer Science