Search Results

1. Liu, Jiuxing Designing high performance and scalable MPI over InfiniBand

Doctor of Philosophy, The Ohio State University, 2004, Computer and Information Science

Rapid technological advances in recent years have made powerful yet inexpensive commodity PCs a reality. New interconnecting technologies that deliver very low latency and very high bandwidth are also becoming available. These developments lead to the trend of cluster computing , which combines the computational power of commodity PCs and the communication performance of high speed interconnects to provide cost-effective solutions for computational intensive applications, especially for those grand challenge applications such as weather forecasting, air flow analysis, protein searching, and ocean simulation. InfiniBand was proposed recently as the next generation interconnect for I/O and inter-process communication. Due to its open standard and high performance, InfiniBand is becoming increasingly popular as an interconnect for building clusters. However, since it is not designed specifically for high performance computing, there exists a semantic gap between its functionalities and those required by high performance computing software such as Message Passing Interface (MPI). In this dissertation, we take on this challenge and address research issues in designing efficient and scalable communication subsystems to bridge this gap. We focus on how to take advantage of the novel features offered by InfiniBand to design different components in the communication subsystems such as protocol design, flow control, buffer management, communication progress, connection management, collective communication, and multirail network support. Our research has already made notable contributions in the areas of cluster computing and InfiniBand. A large part of our research has been integrated into our MVAPICH software, which is a high performance and scalable MPI implementation over InfiniBand. Our software is currently used by more than 120 organizations world-wide to build InfiniBand clusters, including both research testbeds and production systems. Some of the fastest supercompute (open full item for complete abstract)

Committee: Dhabaleswar Panda (Advisor) Subjects: Computer Science

2. Huff, John Performance Characteristics of the Interplanetary Overlay Network in 10 Gbps Networks

Master of Science (MS), Ohio University, 2021, Computer Science (Engineering and Technology)

The Interplanetary Internet (IPN) is an architecture for standardized communication between nodes located on or around different celestial bodies. The key concept of the IPN is to use standard Internet protocols within local high-bandwidth, low-latency networks and to interconnect these networks using an "interplanetary backbone" comprised of satellites and ground stations communicating using specialized protocols designed for use in low-bandwidth, high-latency networks. This thesis focuses on the performance within local networks constructed for use in an IPN setting. Delay Tolerant Networking (DTN) is a protocol designed to solve the challenges of IPN. This thesis studies the performance characteristics of the Interplanetary Overlay Network (ION), an implementation of the DTN protocol. A hardware test bench was constructed using two high-performance computers directly connected via a 10 Gbps link. A software tool was devised to test the throughput over this link under various configurations of ION. Through this testing, improvements to ION and configuration recommendations were found to increase the performance of ION in 10 Gbps networks. The main increases in performance were the result of locking the threads of ION to the same CPU core and increasing the shared memory allocation to convergence layer processes. Performance of ION was also studied on a test bench utilizing an ARM A53 processor which uses the same ARMv8 architecture used in the High Performance Spaceflight Computing architecture.

Committee: Shawn Ostermann PhD (Advisor); David Juedes PhD (Committee Member); Harsha Chenji PhD (Committee Member); Julio Arauz PhD (Committee Member) Subjects: Aerospace Engineering; Communication; Computer Science; Information Systems; Information Technology

3. Ali, Nawab Rethinking I/O in High-Performance Computing Environments

Doctor of Philosophy, The Ohio State University, 2009, Computer Science and Engineering

As the types of problems we solve in high-performance computing and other areas become more complex, the amount of data generated and used is growing at a rapid rate. Today many terabytes of data are common; tomorrow petabytes of data will be the norm. One of the challenges in high-performance computing is to provide applications with high-speed data access in a distributed, heterogeneous environment. In this dissertation we question the existing I/O paradigms in high-performance computing environments and suggest better alternatives across both local and wide-area networks. We propose three different techniques to accommodate the I/O requirements of scientific applications. We present a new design for a high-performance, scalable parallel file system that obviates the need for dedicated I/O and metadata servers by utilizing object-based storage devices. We also propose a new remote I/O paradigm that takes advantage of the increasing popularity of high-speed networks and centralized data repositories to perform I/O over wide-area networks. Furthermore, we present a scalable I/O forwarding framework that bridges the increasing performance gap between the processing power and the I/O subsystems of massively parallel leadership-class machines such as the IBM Blue Gene/P.

Committee: P Sadayappan (Committee Chair); Han-Wei Shen (Other); Pete Wyckoff (Other); Gagan Agrawal (Other) Subjects: Computer Science

4. Wu, Jiesheng Communication and memory management in networked storage systems

Doctor of Philosophy, The Ohio State University, 2004, Computer and Information Science

The advent in both storage architectures and networking technologies has facilitated storage services using networked storage systems. Often, as many researchers and developers are acutely aware of, hardware and architecture developments that purport to improve performance lack synergy with the software systems they were intended to enhance. New developments in both storage architectures and networking technologies have a profound impact on the design and implementation of networked storage software. In this dissertation, we explore the effects of these advances on the development of networked storage software systems. In particular, how these advances influence communication and memory management and how to design new communication and memory management schemes to take advantage of these advances are investigated. This dissertation first focuses on communication and memory management in the transport layer of a cluster file system over InfiniBand to make the most out of InfiniBand benefits. This dissertation then presents an integrated communication buffer and cache management. This integrated management not only eliminates redundant memory copying and multiple buffering which are considered as main performance bottlenecks of networked storage systems in the general-purpose operating systems, but also enables networked storage software to take full advantage of RDMA benefits in emerging network technologies. This dissertation also introduces a buffering scheme to achieve efficient exclusive caching in multi-level cache hierarchy which is often formed in networked storage systems, which makes better use of memory resources in different cache levels. The main conclusion of this dissertation is that using innovative methods to manage communication and memory can significantly improve performance and scalability of a networked storage system. To achieve this requires studying and taking advantage of the new features in the emerging networking technologies and storage a (open full item for complete abstract)

Committee: Dhabaleswar Panda (Advisor) Subjects: Computer Science

5. Suleman, Mahmoud Junior The Use of High-Performance Computing Services in University Settings: A Usability Case Study of the University of Cincinnati's High-Performance Computing Cluster.

MS, University of Cincinnati, 2023, Education, Criminal Justice, and Human Services: Information Technology

The University of Cincinnati's Advanced Research Computing Center employs effective ways through this study in order to make the High-Performance Computing Cluster accessible across all disciplines on campus and the Cincinnati Innovation District. To understand the needs of our users, we employed Norman Nielsen's Group principles of conducting a usability study which involved a survey and think-aloud activity to draw a cognitive understanding of our participants expectations while performing basic tasks and conducted a heuristic evaluation to rate the severity of issues participants identified. Our findings which gave a high-level understanding of how the HPC Cluster can be made more accessible across all disciplines regardless of the user's technical skills, involved the need to build a customized graphical user interface HPC management portal to serve users' needs.Also, investing in workforce development by introducing an academic credit-based High-Performance Computing Course for students and partnering with other faculty's to introduce special programs, e.g. Student Cluster Competitions which would draw more student interest.

Committee: Jess Kropczynski Ph.D. (Committee Chair); Amy Latessa Ph.D. (Committee Member); Shane Halse Ph.D. (Committee Member) Subjects: Information Technology

6. Anderson, Calvin Investigation Of Solid-state Ion Conduction With Stable Silver Isotope Analysis And High Performance Computing

Doctor of Philosophy, Miami University, 2023, Geology and Environmental Earth Science

Solid-state ion conduction (SSIC) is a mechanism of electric current which involves the efficient transport of ions through certain crystalline materials. SSIC occurs naturally in the mineral argentite (α-Ag2S) during the growth of wire silver, and can be induced by heating acanthite (β-Ag2S) in a strong thermal gradient. Given that argentite possesses the highest ionic conductivity of any known material, the wire growth process offers a unique opportunity to study the fundamental nature of SSIC. Previous studies noted a relationship between stable Ag isotope fractionation and temperature which warranted further investigation, and thus two sets of growth experiments were devised which enabled the accurate measurement of the thermal gradient during wire formation. Wire silvers were synthesized and subjected to stable Ag isotope analysis, which clarified that the rate of heavy isotope enrichment is an increasing function of the thermal gradient. These observations are potentially relevant for applications in emerging technologies that leverage SSIC, such as atomic switches; tuning the isotopic composition of charge carriers in a device may help optimize certain side effects of ion conduction. In order to better understand the internal ion dynamics of SSIC, a reactive force field (ReaxFF) potential was developed to enable molecular dynamics (MD) simulations of Ag/S-based atomic switches. ReaxFF forcefield optimization is a notoriously difficult task, typically plagued by slow convergence and restricted to training data based on static configurations. A new high-performance optimization algorithm PAGODA was built from the ground up to enable each worker in a parallel genetic algorithm the ability to control arbitrarily parallel instances of the MD engine LAMMPS. This capability makes feasible the use of full MD simulations in the training set, unlocking the door for many new types of training data, including extended crystal structures and multi-phase composites. PAGODA (open full item for complete abstract)

Committee: John Rakovan (Committee Chair); Claire McLeod (Committee Member); Mehdi Zanjani (Committee Member); Ryan Mathur (Committee Member); Mark Krekeler (Committee Member) Subjects: Computer Science; Condensed Matter Physics; Geochemistry

7. Xing, Haoyuan Optimizing array processing on complex I/O stacks using indices and data summarization

Doctor of Philosophy, The Ohio State University, 2021, Computer Science and Engineering

Increasingly, the ability of human beings to understand the universe and ourselves depends on our ability to obtain and process data. With an explosion of data being generated every day, efficiently storing and querying such data, usually multidimensional and can be represented using an array data model, is increasingly vital. Meanwhile, along with more and more powerful CPUs and accelerators adding into the system, most modern computing systems contain an increasingly complex I/O stack, ranging from traditional disk-based file systems to heterogeneous accelerators with individual memory spaces. Efficiently accessing such a complex I/O stack in array processing is essential to utilize the enormous computational power of modern computational platforms. One key to achieving such efficiency is identifying where the data is being generated or stored, and choosing appropriate representation and processing strategies accordingly. This dissertation focuses on optimizing array processing in such complex I/O stacks by studying these two fundamental questions: what data representation should be used, and where the data should be stored and processed. The two basic scenarios of scientific data analytics are considered one-by-one; The first half of the dissertation tackles the problem of efficiently processing array data post-hoc, presents a compact array storage for disk-based data, integrating lossless value-based indexing into it. Such integrated indices improve the value-based filtering operation performance by orders of magnitude without sacrificing storage size or accuracy. The dissertation then demonstrates how complex queries such as equal and similarity array joins can also be performed on such novel storage. The second half of the dissertation focuses on data generated by simulations on accelerators in-situ without storing the generated data. The system generates an improved bitmap representation on GPU to reduce the bandwidth bottleneck between host and accelerat (open full item for complete abstract)

Committee: Rajiv Ramnath (Advisor); Gagan Agrawal (Advisor); Jason Blevins (Other); Yang Wang (Committee Member); Srinivasan Parthasarathy (Committee Member) Subjects: Computer Engineering; Computer Science

8. Morris, Nathaniel The Modeling and Management of Computational Sprinting

Doctor of Philosophy, The Ohio State University, 2021, Computer Science and Engineering

Sustainable computing, dark silicon and approximate computing have ushered a new era in which some processing capacity is available only as ephemeral bursts, a technique called computational sprinting. Computational sprinting speeds up query execution by increasing power usage, dropping tasks, precision scaling, and etc. for short bursts. Sprinting policy decides when and how long to sprint. Poor policies inflate response time significantly. However, sprinting alters query executions at runtime, creating a complex dependency between queuing and processing time. Sprinting can speed up query processing and reduce queuing delay, but it is challenging to set efficient policies. As sprinting mechanisms proliferate, system managers will need tools to set policies so that response time goals are met. I provide a method to measure the efficiency of sprinting policies and a framework to create response time models for sprinting mechanisms such as DVFS, CPU throttling, cache allocation, and core scaling. I compared sprinting policies used in competitive solutions with policies found using our models.

Committee: Christopher Stewart PHD (Advisor); Radu Teodorescu PHD (Committee Member); Xiaorui Wang PHD (Committee Member); Xiaodong Zhang PHD (Committee Member) Subjects: Computer Science

9. Srivastava, Siddhartha MVAPICH2-AutoTune: An Automatic Collective Tuning Framework for the MVAPICH2 MPI Library

Master of Science, The Ohio State University, 2021, Computer Science and Engineering

The Message Passing Interface (MPI) is a popular parallel programming interface for developing scientific applications. These applications rely a lot on MPI for performance. Collective operations like MPI_Allreduce, MPI_Alltoall, and others, provide an abstraction for group communication on High-Performance Computing (HPC) systems. MVAPICH2 is a popular open-source high-performance implementation of the MPI standard that provides advanced designs for these collectives through various algorithms. These collectives are highly optimized to provide the best performance on different existing and emerging architectures. To provide the best performance, the right algorithm must be chosen for a collective. Choosing the best algorithm depends on many factors like the architecture of the system, the scale at which the application is run, etc. This process of choosing the best algorithm is called tuning of the collective. But tuning of the collective takes a lot of time and using static tables may not lead to the best performance. To solve this issue, we have designed an “Autotuning Framework”. The proposed Autotuning Framework selects the best algorithm for a collective during runtime without having to rely on the previous static tuning of the MVAPICH2 library for the system. Experimental results have shown a performance increase of up to 3X while using the Autotuning Framework version of the MVAPICH2 library versus an untuned MVAPICH2 library for collectives.

Committee: Dhabaleswar K. Panda (Advisor); Radu Teodorescu (Committee Member); Hari Subramoni (Committee Member) Subjects: Computer Engineering; Computer Science

10. Haiyang, Shi Designing High-Performance Erasure Coding Schemes for Next-Generation Storage Systems

Doctor of Philosophy, The Ohio State University, 2020, Computer Science and Engineering

Replication has been a cornerstone of reliable distributed storage systems for years. Replicating data at multiple locations in the system maintains sufficient redundancy to tolerate individual failures. However, the exploding volume and speed of data growth let researchers and engineers think about using storage-efficient fault tolerance mechanisms to replace replication in designing or re-designing reliable distributed storage systems. One promising alternative of replication is Erasure Coding (EC), which trades off extra computation for high reliability and availability at a prominently low storage overhead. Therefore, many existing distributed storage systems (e.g., HDFS 3.x, Ceph, QFS, Google Colossus, Facebook f4, and Baidu Atlas) have started to adopt EC to achieve storage-efficient fault tolerance. However, as EC introduces extra calculations into systems, there are several crucial challenges to think through for exploiting EC. Such as how to leverage heterogeneous EC-capable hardware (e.g., CPUs, General-Purpose Graphics Processing Units (GPGPUs), Field-Programmable Gate Arrays (FPGAs), and Smart Network Interface Cards (SmartNICs)) to accelerate EC computation and bring emergent devices and technologies into the pictures for designing high-performance erasure-coded distributed storage systems. In this dissertation, we propose Mint-EC, a high-performance EC framework to address the aforementioned research challenges. Mint-EC includes three major pillars: 1) a multi-rail EC library that enables upper-layer applications to leverage heterogeneous EC-capable hardware devices to perform EC operations simultaneously and introduces unified APIs to facilitate overlapping opportunities between computation and communication, 2) a set of coherent in-network EC primitives that can be easily integrated into existing state-of-the-art EC schemes and utilized in designing advanced EC schemes to fully leverage the advantages of the coherent in-network EC capabilities on (open full item for complete abstract)

Committee: Xiaoyi Lu (Advisor); Xiaodong Zhang (Committee Member); Christopher Stewart (Committee Member); Yang Wang (Committee Member) Subjects: Computer Engineering; Computer Science

11. Baheri, Betis MARS: Multi-Scalable Actor-Critic Reinforcement Learning Scheduler

MS, Kent State University, 2020, College of Arts and Sciences / Department of Computer Science

In this thesis we introduce a new scheduling algorithm MARS based on a cost-aware multi-scalable reinforcement learning approach, which serves as an intermediate layer between HPC resource manager and user application workflow. MARS ensembles the pre-generated models from users' workflows and decides on the most suitable strategy for optimization. A whole workflow application would be split into several optimized sub-tasks. Then, based on a pre-defined resource management plan, a reward will be generated after executing a scheduled task. Lastly, MARS updates the Deep Neural Network (DNN) model for future use. MARS is designed to be able to optimize the existing models through reinforcement mechanism. MARS can adapt to the shortage of training samples by optimizing the performance through combining small tasks together or switching between pre-built scheduling strategies such as Backfilling, SJF, etc., and choosing the most suitable approach. After testing MARS using different real-world workflow traces, results shows that MARS can achieve between 5%-60% better performance against the other approaches.

Committee: Qiang Guan Dr. (Advisor); Feodor Dragan Dr. (Committee Member); Rouming Jin Dr. (Committee Member) Subjects: Computer Science

12. Kaster, Joshua Training Convolutional Neural Network Classifiers Using Simultaneous Scaled Supercomputing

Master of Science (M.S.), University of Dayton, 2020, Electrical Engineering

Convolutional neural networks (CNN) are revolutionizing and improving today's technological landscape at a remarkable rate. Yet even in their success, creating optimal trained networks depends on expensive empirical processing to generate the best results. They require powerful processors, expansive datasets, days of training time, and hundreds of training instances across a range of hyperparameters to identify optimal results. These requirements can be difficult to access for the typical CNN technologist and ultimately wasteful of resources, since only the most optimal model will be utilized. To overcome these challenges and create a foundation for the next generation of CNN technologist, a three-stage solution is proposed: (1) To cultivate a new dataset containing millions of domain-specific (aerial) annotated images; (2) to design a flexible experiment generator framework which is easy to use, can operate on the fastest supercomputers in the world, and can simultaneously train hundreds of unique CNN networks; and (3) to establish benchmarks of accuracies and optimal training hyperparameters. An aerial imagery database is presented which contains 260 new cultivated datasets, features tens of millions of annotated image chips, and provides several distinct vehicular classes. Accompanying the database, a CNN-training framework is presented which can generate hundreds of CNN experiments with extensively customizable input parameters. It operates across 11 cutting-edge CNN architectures, any Keras-formatted database, and is supported on 3 unique Linux operating systems - including two supercomputers ranked in the top 70 worldwide. Training can be easily performed by simply inputting desirable parameter ranges in a pre-formatted spreadsheet. The framework creates unique training experiments for every combination of dataset, hyperparameter, data augmentation, and super computer requested. The resulting hundreds of trained networks provides the performance to perform (open full item for complete abstract)

Committee: Eric Balster (Committee Chair); Patrick Hytla (Committee Member); Vijayan Asari (Committee Member) Subjects: Artificial Intelligence; Computer Engineering; Computer Science; Electrical Engineering; Engineering

13. Shankar, Dipti Designing Fast, Resilient and Heterogeneity-Aware Key-Value Storage on Modern HPC Clusters

Doctor of Philosophy, The Ohio State University, 2019, Computer Science and Engineering

With the recent emergence of in-memory computing for Big Data analytics, memory-centric and distributed key-value storage has become vital to accelerating data processing workloads, in high-performance computing (HPC) and data center environments. This has led to several research works focusing on advanced key-value store designs with Remote- Direct-Memory-Access (RDMA) and hybrid `DRAM+NVM' storage designs. However, these existing designs are constrained by the blocking store/retrieve semantics; incurring additional complexity with the introduction of high data availability and durability requirements. To cater to the performance, scalability, durability and resilience needs of the diverse key-value store-based workloads (e.g., online transaction processing, offline data analytics, etc.), it is therefore vital to fully exploit resources on modern HPC systems. Moreover, to maximize server scalability and end-to-end performance, it is necessary to focus on designing an RDMA-aware communication engine that goes beyond optimizing the key-value store middleware for better client-side latencies. Towards addressing this, in this dissertation, we present a `holistic approach' to designing high-performance, resilient and heterogeneity-aware key-value storage for HPC clusters, that encompasses: (1) RDMA-enabled networking, (2) high-speed NVMs, (3) emerging byte-addressable persistent memory devices, and, (4) SIMD-enabled multi-core CPU compute capabilities. We first introduce non-blocking API extensions to the RDMA- Memcached client, that allows an application to separate the request issue and completion phases. This facilitates overlapping opportunities by truly leveraging the one-sided characteristics of the underlying RDMA communication engine, while conforming to the basic Set/Get semantics. Secondly, we analyze the overhead of employing memory-efficient resilience via Erasure Coding (EC), in an online fashion. Based on this, we extend our proposed RDMA-aware key-valu (open full item for complete abstract)

Committee: Dhabaleswar K. Panda (Advisor); Xiaoyi Lu (Advisor); Feng Qin (Committee Member); Gagan Agrawal (Committee Member) Subjects: Computer Engineering; Computer Science

14. Nisa, Israt Architecture-aware Algorithm Design of Sparse Tensor/Matrix Primitives for GPUs

Doctor of Philosophy, The Ohio State University, 2019, Computer Science and Engineering

Sparse matrix/tensor operations have been a common computational motif in a wide spectrum of domains - numerical linear algebra, graph analytics, machine learning, health-care, etc. Sparse kernels play a key role in numerous machine learning algorithms and the rising popularity of this domain increases the significance of the primitives like SpMV (Sparse Matrix-Vector Multiplication), SDDMM (Sampled Dense-Dense Matrix Multiplication), MF/TF(Sparse Matrix/Tensor Factorization), etc. These primitives are data-parallel and highly suitable for GPU-like architectures that provide massive parallelism. Real-world matrices and tensors are large-scale and have millions of data points, which is sufficient to utilize all the cores of a GPU. Yet, a data-parallel algorithm can become the bottleneck of an application and perform way below than the upper bound of the roofline model. Some common reasons are frequent irregular global memory access, low data reuse, and imbalanced work distribution. However, efficient utilization of GPU memory hierarchy, reduced thread communication, increased data locality, and an even workload distribution can provide ample opportunities for significant performance improvement. The challenge lies in utilizing the techniques across applications and achieve an even performance in spite of the irregularity of the input matrices or tensors. In this work, we systematically identify the performance bottlenecks of the important sparse algorithms and provide optimized and high performing solutions. At the beginning of this dissertation, we explore the application of cost-effective ML techniques in solving the format selection and performance modeling problem in the SpMV domain. By identifying a small set of sparse matrix features to use in training the ML models, we are able to select the best storage format and predict the execution time of an SpMV kernel as well. Next, we optimize the SDDMM kernel, which is a key bottleneck in fa (open full item for complete abstract)

Committee: P. (Saday) Sadayappan (Advisor); Atanas Rountev (Committee Member); Radu Teodorescu (Committee Member) Subjects: Computer Science

15. Shanmugam Sakthivadivel, Saravanakumar Fast-NetMF: Graph Embedding Generation on Single GPU and Multi-core CPUs with NetMF

Master of Science, The Ohio State University, 2019, Computer Science and Engineering

There is growing interest for learning representations for nodes in a network. Several embedding generation algorithms have been proposed in the last few years that generate high quality representations for downstream tasks like node classification and link prediction. NetMF is one such algorithm that provides the theoretical foundations for proving that several network representation learning techniques implicitly factorize a closed form matrix derived from the graph. However, the NetMF algorithm is slow and does not scale well, owing to the multiple dense matrix multiplication steps and Singular Value Decomposition (SVD). We present Fast-NetMF, a fast, highly scalable version of the NetMF algorithm with reduced running time. In this work, we investigate the acceleration of NetMF under single-GPU and multi-core CPU settings. We also investigate replacing the slow SVD based matrix factorization step for faster and more parallel-friendly factorization techniques like Non-negative Matrix Factorization (NMF).

Committee: Srinivasan Parthasarathy (Advisor); Sadayappan P (Committee Member) Subjects: Computer Engineering; Computer Science

16. Mosley, Liam Modeling and Phylodynamic Simulations of Avian Influenza

Master of Science, Miami University, 2019, Computer Science and Software Engineering

Avian Influenza Viruses (AIV) are highly adaptive and mutate continuously throughout their life-cycle. Subtype H5N1, also known as Highly Pathogenic Asian Avian Influenza, is of particular interest due to its rapid spread from Asia to other countries. Constant mutations in the protein sequences of AIVs cause antigenic drift which leads to the spread of epidemics to livestock, causing billions of dollars in socio-economic losses each year. Consequently, containment of AIV epidemics is of vital importance. Computational approaches to epidemic forecasting, specifically phylodynamic simulations, enhance in vivo analysis by enabling analysis of ecological parameters, evolutionary traits, and the ability to predict antigenic shifts to assist vaccine design. This work introduces an improvement on existing phylodynamic simulations models, called the HASEQ model, by using actual Hemagglutinin (HA) protein sequences, simulating mutations through amino acid substitution models, and implementing an amino-acid level antigenic analysis algorithm to model natural selection pressure. In contrast to prior approaches that rely on abstract representations of virus strains and mutations, HASEQ manipulates and yields actual HA strains to allow for robust validation and direct application of results to inform epidemic containment efforts. The validity of the HASEQ model is assessed via comparisons to WHO Nomenclature refined to represent strains present in 3 high risk countries. The model is calibrated and validated using thousands of simulations with wide-ranging parameter settings requiring over 2,500 hours of computation time. Results show that the model improvements yield results with the expected evolutionary characteristics at the cost of increasing computational run-time costs 10-fold.

Committee: Dhananjai Rao (Advisor); Eric Rapos (Committee Member); Eric Bachmann (Committee Member) Subjects: Bioinformatics; Biology; Computer Science; Epidemiology

17. Zhang, Jie Designing and Building Efficient HPC Cloud with Modern Networking Technologies on Heterogeneous HPC Clusters

Doctor of Philosophy, The Ohio State University, 2018, Computer Science and Engineering

Cloud Computing platforms (e.g, Amazon EC2 and Microsoft Azure) have been widely adopted by many users and organizations due to their high availability and scalable computing resources. By using virtualization technology, VM or container instances in a cloud can be constructed on bare-metal hosts for users to run their systems and applications whenever they need computational resources. This has significantly increased the flexibility of resource provisioning in clouds compared to the traditional resource management approaches. These days cloud computing has gained momentum in HPC communities, which brings us a broad challenge: how to design and build efficient HPC clouds with modern networking technologies and virtualization capabilities on heterogeneous HPC clusters? Through the convergence of HPC and cloud computing, the users can get all the desirable features such as ease of system management, fast deployment, and resource sharing. However, many HPC applications running on the cloud still suffer from fairly low performance, more specifically, the degraded I/O performance from the virtualized I/O devices. Recently, a hardware-based I/O virtualization standard called Single Root I/O Virtualization (SR-IOV) has been proposed to help solve the problem, which makes SR-IOV achieve near-native I/O performance. Whereas SR-IOV lacks locality-aware communication support, which makes the communications across the co-located VMs or containers not able to leverage the shared memory backed communication mechanisms. To deliver high performance to the end HPC applications in the HPC cloud, we present a high-performance locality-aware and NUMA-aware MPI library over SR-IOV enabled InfiniBand clusters, which is able to dynamically detect the locality information on VM, container or even nested cloud environment and coordinate the data movements appropriately. The proposed design improves the performance of NAS by up to 43% over the default SR-IOV based scheme across 32 VMs, whi (open full item for complete abstract)

Committee: Dhabaleswar K. Panda (Advisor); Yang Wang (Committee Member); Stewart Christopher (Committee Member); Sadayappan P (Committee Member); Xiaoyi Lu (Committee Member) Subjects: Computer Engineering; Computer Science

18. Dutta, Soumya In Situ Summarization and Visual Exploration of Large-scale Simulation Data Sets

Doctor of Philosophy, The Ohio State University, 2018, Computer Science and Engineering

Recent advancements in the computing power have enabled the application scientists to design their simulation study using very high-resolution computational models. The output data from such simulations provide a plethora of information that need to be explored for enhanced understanding of the underlying phenomena. Large-scale simulations, nowadays, produce multivariate, time-varying data sets in the order of petabytes and beyond. Traditional post-processing based analysis utilizing raw data cannot be readily applicable, since storing all the data is becoming prohibitively expensive. This is because of the bottleneck stemming from output data size and I/O compared to the ever-increasing computing speed. Hence, exploration and visualization of such extreme-scale simulation outputs are posing significant challenges. This dissertation addresses the aforementioned issues and suggests an alternative pathway by enabling in situ analysis, i.e., in-place analysis of data, while it still resides in supercomputer memory. We embrace the in situ technology and adopt simulation time data analysis, triage, and summarization using various data transformation techniques. The proposed methods process data as the simulation generates it and employ different analysis techniques to extract important data properties efficiently. However, the amount of work that can be done in situ is often limited in terms of time and storage since overburdening the simulation with additional computation is undesired. Furthermore, while some application domain driven analyses fit well for an in situ environment, a wide range of visual-analytics tasks require longer time involving iterative exploration during post-processing. Therefore, to this end, we conduct in situ statistical data summarization in the form of compact probability distribution functions, which preserve essential statistical data properties and facilitate flexible and scalable post-hoc exploration. We show that the reduced stati (open full item for complete abstract)

Committee: Han-Wei Shen (Advisor) Subjects: Computer Engineering; Computer Science

19. Srivastava, Rohit Kumar Modeling Performance of Tensor Transpose using Regression Techniques

Master of Science, The Ohio State University, 2018, Computer Science and Engineering

Tensor transposition is an important primitive in many tensor algebra libraries. For example, tensor contractions are implemented using TTGT(Transpose-Transpose-GEMM-Transpose) approach. Performing efficient transpose of an arbitrary tensor requires different optimization techniques depending on the required permutation. Exhaustive evaluation of all parameter choices like slice size and blocking is prohibitively expensive. We present an approach to model the performance of different kernels inside TTLG, a Tensor Transpose Library for GPUs, for different parameters like slice size, blocking, and resultant warp efficiency etc. Predictions made by this model are then used to guide in kernel and its parameter selection.

Committee: Ponnuswamy Sadayappan (Advisor); Radu Teodorescu (Committee Member) Subjects: Computer Science

20. Putnam, Patrick Scalable, High-Performance Forward Time Population Genetic Simulation

PhD, University of Cincinnati, 2018, Engineering and Applied Science: Computer Science and Engineering

Forward-time population genetic simulators are computational tools used in the study of population genetics. Simulations aim to evolve the genetic state of a population relative to a set of genetic models that reflect the processes that occur in nature under various configurations. Often, these simulations are limited to evolutionary scales that can be represented within the memory space and feasibly computed using a standard workstation computer. This presents a general challenge of how to represent the genetics of a population to enable evolutionary scenarios of sufficient scale to be performed on a memory constrained system. In addition, as the evolutionary scales increase so too does the computational time necessary to complete the simulation. This work considers the general problems of scale and performance as they relate to forward-time population genetic simulation. It explores the representation of a population from the perspective of a graph. Improved memory utilization and computational performance are achieved through the use of a binary adjacency matrix representation of the graph. This use of this representation is generally uncommon in forward-time population genetic simulation. Further improvements are made to the performance of the simulator through the use of parallel computation. This work considers a forward-time population genetic simulation from both a taskand a data- parallel perspective. Each of these perspectives present certain challenges and offer different levels of performance gains. The utilization of the binary adjacency matrix representation enables each of these parallel approaches to be achieved. Finally, although scale and performance improvements are enabled through the use of a binary adjacency matrix representation of the graph, it does have limits in forward-time population genetic simulation. These limits are related to the density of the graph. This work offers a situation where this representation w (open full item for complete abstract)

Committee: Philip Wilsey Ph.D. (Committee Chair); Fred Beyette Ph.D. (Committee Member); Yizong Cheng Ph.D. (Committee Member); Karen Davis Ph.D. (Committee Member); Ge Zhang Ph.D. (Committee Member) Subjects: Computer Science

Basic Search

Left Column

Filters

Right Column

Search Results

Search Results

Mini-Tools

Search Report

1. Liu, Jiuxing Designing high performance and scalable MPI over InfiniBand

2. Huff, John Performance Characteristics of the Interplanetary Overlay Network in 10 Gbps Networks

3. Ali, Nawab Rethinking I/O in High-Performance Computing Environments

4. Wu, Jiesheng Communication and memory management in networked storage systems

5. Suleman, Mahmoud Junior The Use of High-Performance Computing Services in University Settings: A Usability Case Study of the University of Cincinnati's High-Performance Computing Cluster.

6. Anderson, Calvin Investigation Of Solid-state Ion Conduction With Stable Silver Isotope Analysis And High Performance Computing

7. Xing, Haoyuan Optimizing array processing on complex I/O stacks using indices and data summarization

8. Morris, Nathaniel The Modeling and Management of Computational Sprinting

9. Srivastava, Siddhartha MVAPICH2-AutoTune: An Automatic Collective Tuning Framework for the MVAPICH2 MPI Library

10. Haiyang, Shi Designing High-Performance Erasure Coding Schemes for Next-Generation Storage Systems

11. Baheri, Betis MARS: Multi-Scalable Actor-Critic Reinforcement Learning Scheduler

12. Kaster, Joshua Training Convolutional Neural Network Classifiers Using Simultaneous Scaled Supercomputing

13. Shankar, Dipti Designing Fast, Resilient and Heterogeneity-Aware Key-Value Storage on Modern HPC Clusters

14. Nisa, Israt Architecture-aware Algorithm Design of Sparse Tensor/Matrix Primitives for GPUs

15. Shanmugam Sakthivadivel, Saravanakumar Fast-NetMF: Graph Embedding Generation on Single GPU and Multi-core CPUs with NetMF

16. Mosley, Liam Modeling and Phylodynamic Simulations of Avian Influenza

17. Zhang, Jie Designing and Building Efficient HPC Cloud with Modern Networking Technologies on Heterogeneous HPC Clusters

18. Dutta, Soumya In Situ Summarization and Visual Exploration of Large-scale Simulation Data Sets

19. Srivastava, Rohit Kumar Modeling Performance of Tensor Transpose using Regression Techniques

20. Putnam, Patrick Scalable, High-Performance Forward Time Population Genetic Simulation

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Basic Search

Left Column

Filters

By Year

Degree Name

Submission Site

Subject

Language

Right Column

Search Results

Search Results

Mini-Tools

Search Report

1. Liu, Jiuxing Designing high performance and scalable MPI over InfiniBand

2. Huff, John Performance Characteristics of the Interplanetary Overlay Network in 10 Gbps Networks

3. Ali, Nawab Rethinking I/O in High-Performance Computing Environments

4. Wu, Jiesheng Communication and memory management in networked storage systems

5. Suleman, Mahmoud Junior The Use of High-Performance Computing Services in University Settings: A Usability Case Study of the University of Cincinnati's High-Performance Computing Cluster.

6. Anderson, Calvin Investigation Of Solid-state Ion Conduction With Stable Silver Isotope Analysis And High Performance Computing

7. Xing, Haoyuan Optimizing array processing on complex I/O stacks using indices and data summarization

8. Morris, Nathaniel The Modeling and Management of Computational Sprinting

9. Srivastava, Siddhartha MVAPICH2-AutoTune: An Automatic Collective Tuning Framework for the MVAPICH2 MPI Library

10. Haiyang, Shi Designing High-Performance Erasure Coding Schemes for Next-Generation Storage Systems

11. Baheri, Betis MARS: Multi-Scalable Actor-Critic Reinforcement Learning Scheduler

12. Kaster, Joshua Training Convolutional Neural Network Classifiers Using Simultaneous Scaled Supercomputing

13. Shankar, Dipti Designing Fast, Resilient and Heterogeneity-Aware Key-Value Storage on Modern HPC Clusters

14. Nisa, Israt Architecture-aware Algorithm Design of Sparse Tensor/Matrix Primitives for GPUs

15. Shanmugam Sakthivadivel, Saravanakumar Fast-NetMF: Graph Embedding Generation on Single GPU and Multi-core CPUs with NetMF

16. Mosley, Liam Modeling and Phylodynamic Simulations of Avian Influenza

17. Zhang, Jie Designing and Building Efficient HPC Cloud with Modern Networking Technologies on Heterogeneous HPC Clusters

18. Dutta, Soumya In Situ Summarization and Visual Exploration of Large-scale Simulation Data Sets

19. Srivastava, Rohit Kumar Modeling Performance of Tensor Transpose using Regression Techniques

20. Putnam, Patrick Scalable, High-Performance Forward Time Population Genetic Simulation

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links