Doctor of Philosophy, The Ohio State University, 2019, Computer Science and Engineering
Machine learning is becoming an integral part of everyday life. Therefore, development of a high performance genre of machine learning algorithms is becoming increasingly significant from the perspectives of performance, efficiency, and optimization. The current solution is to use machine learning frameworks such as TensorFlow, PyTorch and CNTK, which enable us to utilize specialized architectures such as multi-core CPUs, GPUs, TPUs and FPGAs. However, many machine learning frameworks facilitate high productivity, but are not designed for high performance. There is a significant gap in the performance achievable by these frameworks and the peak compute capability of the current architectures. In order for machine learning algorithms to be accelerated for large-scale data, it is essential to develop architecture-aware machine learning algorithms. Since many machine learning algorithms are very computationally demanding, parallelization has garnered considerable interest. In order to achieve high performance, data locality optimization is extremely critical, since the cost of data movement from memory is significantly higher than the cost of performing arithmetic/logic operations on current processors. However, the design and implementation of new algorithms in machine learning has been largely driven by a focus on computational complexity.
In this dissertation, the parallelization of three extensively used machine learning algorithms, Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), and Word2Vec, is addressed by a focus on minimizing the data movement overhead through the memory hierarchy, using techniques such as 2D-tiling and rearrangement of data computation. While developing each parallel algorithm, a systematic analysis of data access patterns and data movements of the algorithm is performed and suitable algorithmic adaptations and parallelization strategies are developed for both multi-core CPU and GPU platforms. Experimental resul (open full item for complete abstract)
Committee: P. Sadayappan (Advisor); Srinivasan Parthasarathy (Committee Member); Eric Fosler-Lussier (Committee Member)
Subjects: Computer Science