Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 4)

Mini-Tools

 
 

Search Report

  • 1. Jose, Jithin Designing High Performance and Scalable Unified Communication Runtime (UCR) for HPC and Big Data Middleware

    Doctor of Philosophy, The Ohio State University, 2014, Computer Science and Engineering

    The computation and communication requirements of modern HighPerformance Computing (HPC) and Big Data applications are steadily increasing. HPC scientific applications typically use Message Passing Interface (MPI) as the programming model, however, there is an increased focus on hybrid MPI+PGAS (Partitioned Global Address Space) models for emerging exascale systems. Big Data applications rely on middleware such as Hadoop (including MapReduce, HDFS, HBase, etc.) and Memcached. It is critical that these middleware be designed with high scalability and performance for next generation systems. In order to ensure that HPC and Big Data applications can continue to scale and leverage the capabilities and performance of emerging technologies, a high performance communication runtime is much needed. This thesis focuses on designing a high performance and scalable Unified Communication Runtime (UCR) for HPC and Big Data middleware. In HPC domain, MPI has been the prevailing communication middleware for more than two decades. Even though it has been successful in developing regular and iterative applications, it can be very difficult to use MPI and maintain performance for irregular, data-driven applications. PGAS programming model presents an attractive alternative for designing such applications and provides higher productivity. It is widely believed that parts of applications can be redesigned using PGAS models - leading to hybrid MPI+PGAS applications, and improve performance. In order to fully leverage the performance benefits offered by the modern HPC systems, a unified communication runtime that offers the advantages of both MPI and PGAS programming models is critical. We present "MVAPICH2-X" - a high performance and scalable 'Unified Communication Runtime' that supports both MPI and PGAS programming models. This thesis also targets at redesigning applications making use of hybrid programming features, for better performance. With our hybrid MPI+PGAS design using Un (open full item for complete abstract)

    Committee: Dhabaleswar Panda (Advisor); Ponnuswamy Sadayappan (Committee Member); Radu Teodorescu (Committee Member); Karen Tomko (Committee Member) Subjects: Computer Science
  • 2. Potluri, Sreeram Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects

    Doctor of Philosophy, The Ohio State University, 2014, Computer Science and Engineering

    Accelerators (such as NVIDIA GPUs) and coprocessors (such as Intel MIC/Xeon Phi) are fueling the growth of next-generation ultra-scale systems that have high compute density and high performance per watt. However, these many-core architectures cause systems to be heterogeneous by introducing multiple levels of parallelism and varying computation/communication costs at each level. Application developers also use a hierarchy of programming models to extract maximum performance from these heterogeneous systems. Models such as CUDA, OpenCL, LEO, and others are used to express parallelism across accelerator or coprocessor cores, while higher level programming models such as MPI or OpenSHMEM are used to express parallelism across a cluster. The presence of multiple programming models, their runtimes and the varying communication performance at different levels of the system hierarchy has hindered applications from achieving peak performance on these systems. Modern interconnects such as InfiniBand, enable asynchronous communication progress through RDMA, freeing up the cores to do useful computation. MPI and PGAS models offer one-sided communication primitives that extract maximum performance, minimize process synchronization overheads and enable better computation and communication overlap using the high performance networks. However, there is limited literature available to guide scientists in taking advantage of these one-sided communication semantics on high-end applications, more so on heterogeneous clusters. In our work, we present an enhanced model, MVAPICH2-GPU, to use MPI for data movement from both CPU and GPU memories, in a unified manner. We also extend the OpenSHMEM PGAS model to support such unified communication. These models considerably simplify data movement in MPI and OpenSHMEM applications running on GPU clusters. We propose designs in MPI and OpenSHMEM runtimes to optimize data movement on GPU clusters, using state-of-the-art GPU technologies (open full item for complete abstract)

    Committee: Dhabaleswar K. Panda (Advisor); Ponnuswamy Sadayappan (Committee Member); Radu Teodorescu (Committee Member); Karen Tomko (Committee Member) Subjects: Computer Science
  • 3. Tirukkovalur, Sravya A Global Address Space Approach to Automated Data Management for Parallel Quantum Monte Carlo Applications

    Master of Science, The Ohio State University, 2011, Computer Science and Engineering

    Quantum Monte Carlo is a large class of computer algorithms that simulate quantum systems with the idea of solving the quantum many-body problem. Typical parallel quantum Monte Carlo(QMC) applications use very large spline interpolation tables that are unmodified after initialization. Although only a small fraction of the table may be accessed by each parallel thread/process in a window of execution, the accesses are quite random. Hence current implementations of these methods typically use replicated copies of the entire interpolation table at each node of a parallel computer. This limits scalability since increasing the number of processors does not enable larger systems to be run. In this thesis, we take an automated data management approach which enables existing QMC codes to be adapted with minimal changes to significantly enhance the range of problem sizes that can be run. We primarily use Global Arrays partitioned global address space model to provide efficient distributed, shared storage and further the implementation is optimized by intelligent replication, locality, and data reuse management mechanisms. A transparent software caching mechanism is designed and built on the Global Arrays PGAS (Partitioned Global Address Space) programming model, to enable QMC codes to overcome their current memory limitations in running large-scale simulations. The new GA read-cache (GRC) has been used to enhance the scalability of QWalk, one of the popular QMC applications.

    Committee: Dr. Sadayappan P (Advisor); Dr. Srinivasan Parthasarathy (Committee Member) Subjects: Computer Science
  • 4. Larkins, Darrell Efficient Run-time Support For Global View Programming of Linked Data Structures on Distributed Memory Parallel Systems

    Doctor of Philosophy, The Ohio State University, 2010, Computer Science and Engineering

    Developing high-performance parallel applications that use linked data structures on distributed-memory clusters is challenging. Many scientific applications use algorithms based on linked data structures like trees and graphs. These structures are especially useful in representing relationships between data which may not be known until runtime or may otherwise evolve during the course of a computation. Methods such as n-body simulation, Fast Multipole Methods (FMM), and multiresolution analysis all use trees to represent a fixed space populated by a dynamic distribution of elements. Other problem domains, such as data mining, use both trees and graphs to summarize large input datasets into a set of relationships that capture the information in a form that lends itself to efficient mining. This dissertation first describes a runtime system that provides a programming interface to a global address space representation of generalized distributed linked data structures, while providing scalable performance on distributed memory computing systems. This system, the Global Chunk Layer (GCL), provides data access primitives at the element level, but takes advantage of coarse-grained data movement to enhance locality and improve communication efficiency. The key benefits of using the GCL system include efficient shared-memory style programming of distributed dynamic, linked data structures, the abstraction and optimization of structural elements common to linked data, and the ability to customize many aspects of the runtime to tune application performance. Additionally, this dissertation presents the design and implementation of a tree-specific system for efficient parallel global address space computing. The Global Trees (GT) library provides a global view of distributed linked tree structures and a set of routines that operate on these structures. GT is built on top of the generalized data structure support provided by the GCL runtime and (open full item for complete abstract)

    Committee: P. Sadayappan PhD (Advisor); Atanas Rountev PhD (Committee Member); Paul A.G. Sivilotti PhD (Committee Member) Subjects: Computer Science