Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
MS_Thesis_Radha_Gulhane.pdf (2.05 MB)
ETD Abstract Container
Abstract Header
Accelerated and Memory-Efficient Distributed Deep Learning: Leveraging Quantization, Parallelism Techniques, and Mix-Match Runtime Communication
Author Info
Gulhane, Radha
ORCID® Identifier
http://orcid.org/0009-0009-2591-1082
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1713381834648517
Abstract Details
Year and Degree
2024, Master of Science, Ohio State University, Computer Science and Engineering.
Abstract
In recent years, there has been significant research and development in Deep Learning (DL) due to its efficiency and extensive applicability across diverse domains, including Computer Vision and Large Language Models. However, the architecture of large Deep Learning models, containing dense layers, makes them compute and memory intensive. Distributed Deep Learning (Distributed DL) is the successful adaption to accelerate and enable training and inference for large-scale DL models, where it also deals with various parallel approaches, inference and training techniques, and communication optimization strategies to enhance performance. In this thesis, we focus on accelerated and memory-efficient techniques to optimize distributed training and inference. It is broadly categorized into three different approaches: 1. Inference for scaled images using quantization, achieving a speedup of 6.5x with integer- only quantization and 1.58x with half-precision, with less than 1% accuracy degradation. 2. MPI4DL: Distributed Deep Learning Parallelism framework encompassing various parallelism techniques with integral components such as Spatial Parallelism, Bidirectional Parallelism, and Hybrid Parallelism 3. Communication optimization by leveraging MCR- DL: A distributed module for DL frameworks with support for mixed-backend communication, dynamic selection of the optimal backend, and communication optimization enhancements such as compression and tensor fusion.
Committee
Prof. Dhabaleswar K. Panda (Advisor)
Dr. Aamir Shafi (Committee Member)
Prof. Hari Subramoni (Committee Member)
Pages
63 p.
Subject Headings
Computer Science
Keywords
Deep Learning, Distributed Deep Learning, Quantization, Inference, Parallelism Techniques, Communication Optimization
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Gulhane, R. (2024).
Accelerated and Memory-Efficient Distributed Deep Learning: Leveraging Quantization, Parallelism Techniques, and Mix-Match Runtime Communication
[Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1713381834648517
APA Style (7th edition)
Gulhane, Radha.
Accelerated and Memory-Efficient Distributed Deep Learning: Leveraging Quantization, Parallelism Techniques, and Mix-Match Runtime Communication.
2024. Ohio State University, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1713381834648517.
MLA Style (8th edition)
Gulhane, Radha. "Accelerated and Memory-Efficient Distributed Deep Learning: Leveraging Quantization, Parallelism Techniques, and Mix-Match Runtime Communication." Master's thesis, Ohio State University, 2024. http://rave.ohiolink.edu/etdc/view?acc_num=osu1713381834648517
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1713381834648517
Download Count:
212
Copyright Info
© 2024, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.