Doctor of Philosophy, The Ohio State University, 2020, Computer Science and Engineering
Replication has been a cornerstone of reliable distributed storage systems for years. Replicating data at multiple locations in the system maintains sufficient redundancy to tolerate individual failures. However, the exploding volume and speed of data growth let researchers and engineers think about using storage-efficient fault tolerance mechanisms to replace replication in designing or re-designing reliable distributed storage systems. One promising alternative of replication is Erasure Coding (EC), which trades off extra computation for high reliability and availability at a prominently low storage overhead. Therefore, many existing distributed storage systems (e.g., HDFS 3.x, Ceph, QFS, Google Colossus, Facebook f4, and Baidu Atlas) have started to adopt EC to achieve storage-efficient fault tolerance. However, as EC introduces extra calculations into systems, there are several crucial challenges to think through for exploiting EC. Such as how to leverage heterogeneous EC-capable hardware (e.g., CPUs, General-Purpose Graphics Processing Units (GPGPUs), Field-Programmable Gate Arrays (FPGAs), and Smart Network Interface Cards (SmartNICs)) to accelerate EC computation and bring emergent devices and technologies into the pictures for designing high-performance erasure-coded distributed storage systems.
In this dissertation, we propose Mint-EC, a high-performance EC framework to address the aforementioned research challenges. Mint-EC includes three major pillars: 1) a multi-rail EC library that enables upper-layer applications to leverage heterogeneous EC-capable hardware devices to perform EC operations simultaneously and introduces unified APIs to facilitate overlapping opportunities between computation and communication, 2) a set of coherent in-network EC primitives that can be easily integrated into existing state-of-the-art EC schemes and utilized in designing advanced EC schemes to fully leverage the advantages of the coherent in-network EC capabilities on (open full item for complete abstract)
Committee: Xiaoyi Lu (Advisor); Xiaodong Zhang (Committee Member); Christopher Stewart (Committee Member); Yang Wang (Committee Member)
Subjects: Computer Engineering; Computer Science