Doctor of Philosophy, The Ohio State University, 2021, Computer Science and Engineering
The past decade has witnessed the success of big data processing frameworks, which provide simple interfaces to parallelize and scale applications efficiently. Comparing the design of MapReduce, Spark, and Reduction Object paradigm, we identified that the design of pattern-based Application Programming Interfaces (APIs) can significantly impact the captured application types as well as middleware performance. Therefore, by concluding common patterns in popular big-data and machine learning applications, we want to build new frameworks with both expressive interfaces and efficient middleware, to achieve better parallelism, locality, programmability, fault tolerance, and coverage of applications.
To approach this, Chapter 2 studies the impact of API design on programmability and middleware performance of MapReduce(-like) frameworks. Specifically, we introduce two different variations of the original MapReduce API and efficient implementations of all three APIs. Through performance comparison and modeling, we identify that though MapReduce and similar frameworks have demonstrated high programmability, they fall short in terms of performance. We show that Reduction-Object-based APIs, which only require small additional effort from programmers, can provide high performance.
Following this work, in Chapter 3, we built a high-throughput stream processing framework that offers a high-level API to the users (similar to Reduction Object), is fault-tolerant, and is also more efficient and scalable than current solutions. Particularly, a cost-efficient MPI/OpenMP-based fault-tolerant scheme is incorporated so that the system can survive node failures with only a modest degradation of performance. A comparison against state-of-the-art streaming frameworks shows our system boosts the throughput of test cases by up to 10X and achieves desirable parallelism when scaled out.
In the fast-evolving Internet of Things (IoT) scenario, we envision the potential of leveraging esta (open full item for complete abstract)
Committee: Gagan Agrawal Dr (Advisor); Radu Teodorescu Dr (Advisor); Feng Qin Dr (Committee Member); Christopher Stewart Dr (Committee Member)
Subjects: Computer Engineering; Computer Science