Search Results (1 - 11 of 11 Results)

Sort By  
Sort Dir
 
Results per page  

Potluri, SreeramEnabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects
Doctor of Philosophy, The Ohio State University, 2014, Computer Science and Engineering
Accelerators (such as NVIDIA GPUs) and coprocessors (such as Intel MIC/Xeon Phi) are fueling the growth of next-generation ultra-scale systems that have high compute density and high performance per watt. However, these many-core architectures cause systems to be heterogeneous by introducing multiple levels of parallelism and varying computation/communication costs at each level. Application developers also use a hierarchy of programming models to extract maximum performance from these heterogeneous systems. Models such as CUDA, OpenCL, LEO, and others are used to express parallelism across accelerator or coprocessor cores, while higher level programming models such as MPI or OpenSHMEM are used to express parallelism across a cluster. The presence of multiple programming models, their runtimes and the varying communication performance at different levels of the system hierarchy has hindered applications from achieving peak performance on these systems. Modern interconnects such as InfiniBand, enable asynchronous communication progress through RDMA, freeing up the cores to do useful computation. MPI and PGAS models offer one-sided communication primitives that extract maximum performance, minimize process synchronization overheads and enable better computation and communication overlap using the high performance networks. However, there is limited literature available to guide scientists in taking advantage of these one-sided communication semantics on high-end applications, more so on heterogeneous clusters. In our work, we present an enhanced model, MVAPICH2-GPU, to use MPI for data movement from both CPU and GPU memories, in a unified manner. We also extend the OpenSHMEM PGAS model to support such unified communication. These models considerably simplify data movement in MPI and OpenSHMEM applications running on GPU clusters. We propose designs in MPI and OpenSHMEM runtimes to optimize data movement on GPU clusters, using state-of-the-art GPU technologies such as CUDA IPC and GPUDirect RDMA. Further, we introduce PRISM, a proxy-based multi-channel framework that enables an optimized MPI library for communication on clusters with Intel Xeon Phi co-processors. We evaluate our designs using micro-benchmarks, application kernels and end-applications. We present the re-design of a petascale seismic modeling code to demonstrate the use of one-sided semantics in end-applications and their impact on performance. We finally demonstrate the benefits of using one-sided semantics on heterogeneous clusters.

Committee:

Dhabaleswar K. Panda (Advisor); Ponnuswamy Sadayappan (Committee Member); Radu Teodorescu (Committee Member); Karen Tomko (Committee Member)

Subjects:

Computer Science

Keywords:

Heterogeneous Clusters; GPU; MIC; Many-core Architectures; MPI; PGAS; One-sided; Communication Runtimes; InfiniBand; RDMA; Overlap; HPC Applications

Singh, Ashish KumarOptimizing All-to-All and Allgather Communications on GPGPU Clusters
Master of Science, The Ohio State University, 2012, Computer Science and Engineering

High Performance Computing (HPC) is rapidly becoming an integral part of Science,Engineering and Business. Scientists and engineers are leveraging HPC solutions to run their applications that require high bandwidth, low latency, and very high compute capabilities. General Purpose Graphics Processing Units (GPGPUs)are becoming more popular within the HPC community because of their highly parallel structure, which makes it possible for applications to gain multi-x performance gain. The Tianhe-1A and Tsubame systems received significant attention for their architectures that leverage GPGPUs. Increasingly many scientific applications that were originally written for CPUs using MPI for parallelism are being ported to these hybrid CPU-GPU clusters. In the traditional sense, CPUs perform computation while the MPI library takes care of communication. When computation is performed on GPGPUs, the data has to be moved from device memory to main memory before it can be used in communication. Though GPGPUs provide huge compute potential, the data movement to and from GPGPUs is both a performance and productivity bottleneck. Recently, the MVAPICH2 MPI library has been modified to directly support point-to-point MPI communication from the GPU memory [33]. Using this support, programmers do not need to explicitly move data to main memory before using MPI. This feature also enables performance improvement due to tight integration of GPU data movement and MPI internal protocols.

Collective communication is commonly used in HPC applications. These applications spend a significant portion of their time doing such collective communications. Therefore, optimizing performance of collectives has a significant impact on the applications’ performance. The all-to-all and allgather communication operations in message-passing systems are heavily used collectives that have O(N2) communication, for N processes. In this thesis, we outline the major design alternatives for the two collective communication operations on GPGPU clusters. We propose efficient and scalable designs and provide a corresponding performance analysis. Using our dynamic staging techniques, the latency of MPI Alltoall on GPGPU clusters can be improved by 59% over a Naive approach based implementation and 44% over a Send-Recv based implementation for 32KBytes messages on 32 processes. Our proposed design, Fine Grained Pipeline, can improve the performance of MPI Allgather on GPGPU clusters by 46% over Naive design and 81% over Send-Recv based design for a message size of 16 KBytes on 64 processes. The proposed designs have been incorporated into the open source MPI stack, MVAPICH2.

Committee:

Dhabaleswar K. Panda, Professor (Advisor); P. Sadayappan, Professor (Committee Member)

Subjects:

Computer Science

Keywords:

GPGPU; HPC; Infiniband; Collective Communications; MPI

Zhou, XuanRhoA GTPase Controls Cytokinesis and Programmed Necrosis of Hematopoietic Progenitors
PhD, University of Cincinnati, 2013, Medicine: Molecular and Developmental Biology
In this thesis work, the function of RhoA GTPase in regulating hematopoiesis and hematopoietic stem cells (HSCs) was defined using a conditional knockout mouse model. We demonstrated that RhoA was critical for multi-lineage differentiation in the hematopoietic system and that the deletion of RhoA resulted in hematopoietic failure and rapid lethality. Through congenic and reverse transplantation experiments, we determined that RhoA expressed within the hematopoietic lineages was essential for hematopoiesis. In the primitive HSC and progenitor (HSPC) compartment, RhoA was essential for the generation and survival of progenitor cells, but surprisingly dispensable for long-term maintenance of the HSC population. With a gene expression analysis and a RhoA add-back rescue experiment, we showed that RhoA null HSCs maintained a multi-lineage differentiation potential. Consistent with studies in other tissues, we observed a major cytokinesis block after telophase, which may reflect the defects in the final abscission of two daughter cells. Furthermore, by overexpressing effector binding mutants of RhoA, we indicated that RhoA-ROCK and RhoA-mDia interactions were critical for HSPC cytokinesis. During our cytokinesis investigations, we observed a reduction of myosin signaling, the driving force of the cell separating process. However, actin polymerization, a process integral to cell division, was not reduced by RhoA deficiency. Another interesting aspect of this study is that RhoA deficiency led to an increase in programmed necrosis, rather than apoptosis or autophagy. This increase of necrosis was observed specifically in the committed progenitor population, coinciding with the loss of progenitors in the competitive transplant experiment. This study provides conclusive evidence for the function of RhoA in the primitive hematopoietic compartment and also offers several novel insights for future studies.

Committee:

Yi Zheng, Ph.D. (Committee Chair); Hartmut Geiger, Ph.D. (Committee Member); Gang Huang, Ph.D. (Committee Member); Jose Cancelas-Perez, M.D. (Committee Member); James Mulloy, Ph.D. (Committee Member)

Subjects:

Developmental Biology

Keywords:

RhoA;HSC;HPC;cytokinesis;necrosis

AlMulhem, NorahCryopreservation and Hypothermal Storage of Hematopoietic Stem Cells
MS, University of Cincinnati, 2015, Allied Health Sciences: Transfusion and Transplantation Sciences
The recent availability of commercially available storage media (CryoStor™ and HypoThermosol™) designed for optimal long-term and short-term hematopoietic stem cell (HSC) storage prompted an evaluation of hematopoietic stem cell and hematopoietic progenitor cell (HSC/P) viability and functionality after storage in these media formulations, compared with the conventional media used at Hoxworth Blood Center. Three human umbilical cord blood units (CBUs) were cryopreserved in CryoStor5 (CS5), CryoStor10 (CS10), and a conventional internally prepared cryopreservation medium, then analyzed post-thaw for viability and recovery of several mature and immature hematopoietic cell types, as well as for clonogenic capacity, and proliferation potential. Flow cytometric analysis indicated similar post-thaw viability of most cell subsets cryopreserved in CS5, and CS10 compared to the conventional cryopreservation medium (containing 5 % Dimethylsulfoxide (DMSO) and 2.5 % hydroxyethyl starch). This variation in viability was not statistically significant (p-value 0.2-1). Bromodeoxyuridine (BrdU) uptake was used to measure the ability of the frozen/thawed cells to proliferate in culture for 48 h in response to stem cell factor (SCF), FLt-3 ligand (FLt-3) and thrombopoietin (TPO). Proliferation potential and clonogenic capacity were both slightly better with after freezing in CS10; however, the differences were not statistically significant. This study shows that the conventional medium for cryopreservation used in our laboratory is similarly effective, compared with CS5 or CS10 media, in protecting the cryopreserved CBU derived HSC/P products. The same analytical methods were used to compare HypoThermosol® (HTR-FRS®), which is designed for short-term refrigerated storage of hematopoietic cells, to a locally prepared medium containing Plasma-Lyte A and 0.5 % human serum albumin (HSA). Measurements were performed after 24, 48 and 72 h of cold storage (4°C). Results showed similar viability and recovery after 24 h of storage, but after 48 and 72 h, a significant decline in viability occurred in a few of the subsets when stored in Plasma-Lyte A/HSA medium, compared to when stored in HTS-FRS®. Differences in clonogenic capacity and proliferation potential were not significant, however the cells’ proliferation potential was slightly better after storage in HTS-FSR®. Taken together, these results indicate that the HTS-FRS® storage media preserves hematopoietic cell function better than Plasma-Lyte A/ 0.5 % HSA, especially if the cells are to be stored for more than 24 hours. It is possible that these in-vitro results could translate to improved engraftment after storing umbilical cord blood, bone marrow or mobilized peripheral blood in these new media.

Committee:

Thomas Leemhuis, Ph.D. (Committee Chair); Jose Cancelas-Perez, M.D. (Committee Member); Patricia Morgan Carey, M.D. (Committee Member); Carolyn Lutzko, Ph.D. (Committee Member)

Subjects:

Health Sciences

Keywords:

Cryopreservation;Hypothermal storage;HPC P;CryoStor;HypoThermosole

Raveendran, AarthiA Framework For Elastic Execution of Existing MPI Programs
Master of Science, The Ohio State University, 2011, Computer Science and Engineering
There is a clear trend towards using cloud resources in the scientific or the HPC community, with a key attraction of cloud being the elasticity it offers. In executing HPC applications on a cloud environment, it will clearly be desirable to exploit elasticity of cloud environments, and increase or decrease the number of instances an application is executed on during the execution of the application, to meet time and/or cost constraints. Unfortunately, HPC applications have almost always been designed to use a fixed number of resources.This work focuses on the goal of making existing MPI applications elastic for a cloud framework. Considering the limitations of the MPI implementations currently available, we support adaptation by terminating one execution and restarting a new program on a different number of instances. The components of the system include a decision layer which considers time and cost constraints, a framework for modifying MPI programs, and a cloud-based runtime support that can enable redistributing of saved data, and support automated resource allocation and application restart on a different number of nodes. Using two MPI applications, the feasibility of our approach is demonstrated, it is shown that outputting, redistributing, and reading back data can be a reasonable approach for making existing MPI applications elastic. The decision layer with a feedback model is designed to monitor the application by interact with it at regular intervals, and perform scaling with the assistance of resource allocator when necessary. This approach is tested using the same two applications and is used to meet the user demands of maximum specified input time or budget.

Committee:

Gagan Agrawal, Dr. (Advisor); Christopher Stewart (Committee Member)

Subjects:

Computer Science

Keywords:

MPI elastic HPC cloud Amazon EC2

Raja Chandrasekar, RaghunathDesigning Scalable and Efficient I/O Middleware for Fault-Resilient High-Performance Computing Clusters
Doctor of Philosophy, The Ohio State University, 2014, Computer Science and Engineering
In high-performance computing (HPC), tightly-coupled, parallel applications run in lock-step over thousands to millions of processor cores. These applications simulate a wide-range of scientific phenomena, such as hurricanes and earthquakes, or the functioning of a human heart. The results of these simulations are important and time-critical, e.g., we want to know the path of the hurricane before it makes landfall. Thus, these applications are run on the fastest supercomputers in the world at the largest scales possible. However, due to the increased component count, large-scale executions are more prone to experience faults, with Mean Times Between Failures (MTBF) on the order of hours or days due to hardware breakdowns and soft errors. A vast majority of current-generation HPC systems and application codes work around system failures using rollback-recovery schemes, also known as Checkpoint-Restart (CR), wherein the parallel processes of an application frequently save a mutually agreed-upon state of their execution as checkpoints in a globally-shared storage medium. In the face of failures, applications rollback their execution to a fault-free state using these snapshots that were saved periodically. Over the years, checkpointing mechanisms have gained notoriety for their colossal I/O demands. While state-of-art parallel file systems are optimized for concurrent accesses from millions of processes, checkpointing overheads continue to dominate application run times, with the time taken to write a single checkpoint taking on the order of tens of minutes to hours. On future systems, checkpointing activities are predicted to dominate compute time and overwhelm file system resources. On supercomputing systems geared for Exascale, parallel applications will have a wider range of storage media to choose from - on-chip/off-chip caches, node-level RAM, Non-Volatile Memory (NVM), distributed-RAM, flash-storage (SSDs), HDDs, parallel file systems, and archival storage. Current-generation checkpointing middleware and frameworks are oblivious to this hierarchy in storage where each medium has unique performance and data-persistence characteristics. This thesis proposes a cross-layer framework that leverages this hierarchy in storage media, to design scalable and low-overhead fault-tolerance mechanisms that are inherently I/O bound. The key components of the framework include - \textit{CRUISE}, a highly-scalable in-memory checkpointing system that leverages both volatile and Non-Volatile Memory technologies; \textit{Stage-FS}, a light-weight data-staging system that leverages burst-buffers and SSDs to asynchronously move application snapshots to a remote file system; Stage-QoS, a file system agnostic Quality-of-Service mechanism for data-staging systems that minimizes network contention; \textit{MIC-Check}, a distributed checkpoint-restart system for coprocessor-based supercomputing systems; \textit{Power-Check}, an energy-efficient checkpointing framework for transparent and application-aware HPC checkpointing systems; and \textit{FTB-IPMI}, an out-of-band fault-prediction mechanism that pro-actively monitors for failures. The components of this framework have been evaluated up to a scale of three million compute processes, have reduced the checkpointing overhead on scientific applications by a factor of 30, and reduced the amount of energy consumed by checkpointing systems by up to 48\%.

Committee:

Dhabaleswar Panda (Advisor); Ponnuswamy Sadayappan (Committee Member); Radu Teodorescu (Committee Member); Kathryn Mohror (Committee Member)

Subjects:

Computer Engineering; Computer Science

Keywords:

fault-tolerance; resilience; checkpointing; process-migration; Input-Output; HPC; supercomputing; MPI; MVAPICH; accelerators; energy-efficiency;

GANDHI, SACHINAN FPGA IMPLEMENTATIN OF FDTD CODES FOR RECONFIGURABLE HIGH PERFORMANCE COMPUTING
MS, University of Cincinnati, 2004, Engineering : Computer Engineering
Finite-difference time-domain (FDTD) codes are used in modeling RF signatures and electronic coupling. Despite improvements in large-scale modeling, simulation codes and the acquisition of powerful high-performance computing (HPC) platforms, simulations of such scientific problems require still more powerful computers. The advances in Field-Programmable Gate Array (FPGA) chips and FPGA-based co-processor (FPGA-CP) boards offer the potential for accelerating current and future HPC platforms. Higher capacity FPGAs offer an opportunity for implementing floating point applications, previously infeasible. In this thesis, I have investigated the feasibility of using FPGA-CP acceleration for FDTD simulation. The FDTD method has been chosen for this case study due to its simple computation kernel and abundant parallelism. A floating-point solver for eletro-magnetic simulation has been implemented. Results achieved in this thesis demonstrate the feasibility of high throughput and a good circuit density through the use of architectural features like the 18-bit block multipliers provided by modern-day FPGAs.

Committee:

Dr. Karen Tomko (Advisor)

Keywords:

FPGA; FDTD; HPC; hardware implementation; Finite Difference Time-Domain; Reconfigurable High Performance Computing; floating-point implementation

Samsi, Siddharth SadanandComputer Aided Analysis of IHC and H&E Stained Histopathological Images in Lymphoma and Lupus
Doctor of Philosophy, The Ohio State University, 2012, Electrical and Computer Engineering

The use of computers in medical image analysis has seen tremendous growth following the development of imaging technologies that can capture image data in-vivo as well as ex-vivo. While the field of radiology has adopted computer aided image analysis in research as well as clinical settings, the use of similar techniques in histopathology is still in a nascent stage. The current gold standard in diagnosis involves labor-intensive tasks such as cell counting and quantification for disease diagnosis and characterization. This process can be subjective and affected by human factors suach as reader bias and fatigue. Computer based tools such as digital image analysis have the potential to help alleviate some of these problems while also offering insights that may not be readily apparent when viewing glass slides under an optical microscope. Commercially available high-resolution slide scanners now make it possible to obtain images of whole slides scanned at 40x microscope resolution. Additionally, advanced tools for scanning tissue images at 100x resolution are also available. Such scanning tools have led to a large amount of research focused on the development of image analysis techniques for histopathological images. While the availability of high-resolution image data presents innumerable research opportunities, it also leads to several challenges that must be addressed.

This dissertation explores some of the challenges associated with computer-aided analysis of histopathological images. Specifically, we develop a number of tools for Follicular Lymphoma and Lupus. We aim to develop algorithms for detection of salient features in tissue biopsies of follicular lymphoma tumors. We analyze the algorithms from a computational point of view and develop techniques for processing whole slide images efficiently using high performance computing resources. In the application of image analysis for Lupus, we analyze mouse renal biopsies for characterizing the distribution of infiltrates in tissue as well as develop algorithms for identification of tissue components such as the glomeruli, which play a significant role in the diagnosis of the disease. Finally, we explore the development of a web-based system for dissemination of high-resolution images of tissues with the goal of advancing collaboration, research and teaching. Through the use of web technologies and developments in the field of geospatial imaging, we demonstrate the efficacy of an online tissue repository that can enable pathologists, medical students and all researchers to explore these images as well as use high performance computing systems to leverage computer-aided diagnosis tools in their field.

Committee:

Ashok Krishnamurthy, PhD (Advisor); Bradley Clymer, PhD (Committee Member); Kimerly Powell, PhD (Committee Member)

Subjects:

Electrical Engineering; Information Systems; Medical Imaging

Keywords:

Histopathology; Renal image analysis; Lymphoma; IHC; H&38;E;virtual microscopy;high performance computing; HPC

Rahman, Md Wasi-ur-Designing and Modeling High-Performance MapReduce and DAG Execution Framework on Modern HPC Systems
Doctor of Philosophy, The Ohio State University, 2016, Computer Science and Engineering
Big Data processing and High-Performance Computing (HPC) are two disruptive technologies that are converging to meet the challenges exposed by large-scale data analysis. MapReduce, a popular parallel programming model for data-intensive applications, is being used extensively through different execution frameworks (e.g. batch processing, Directed Acyclic Graph or DAG) on modern HPC systems because of its ease-of-programming, fault-tolerance, and scalability. However, as these applications begin scaling to terabytes of data, the socket-based communication model, which is the default implementation in the open-source MapReduce execution frameworks, demonstrates performance bottleneck. Moreover, because of the synchronized nature of stocking the data in various execution phases, the default Hadoop MapReduce framework cannot leverage the full potential of the underlying interconnect. MapReduce frameworks also rely heavily on the availability of the local storage media, which introduces space inadequacy for applications that generate a large amount of intermediate data. On the other hand, most leadership-class HPC systems follow the traditional Beowulf architecture with separate parallel storage system and either no, or very limited, local storage. The storage architectures in these HPC systems are not naively conducive for default MapReduce. Also, modern high performance interconnects (e.g. InfiniBand) used to access the parallel storage in these systems can provide extremely low latency and high bandwidth. Additionally, advanced storage architectures, such as Non-Volatile Memories (NVM), can provide byte-addressability as well as data persistence. Efficient utilization of all these resources through enhanced designs of execution frameworks with tuned parameter space is crucial for MapReduce in terms of performance and scalability. This work addresses several of the shortcomings that the current MapReduce execution frameworks hold. It presents an enhanced Big Data execution framework, HOMR (Hybrid Overlapping in MapReduce), which improves the MapReduce job execution pipeline by maximizing overlapping among execution phases. HOMR also introduces RDMA (Remote Direct Memory Access) based shuffle engine with advanced shuffle algorithms to leverage the benefits of high-performance interconnects used in HPC systems. It minimizes the large number of disk accesses in the MapReduce execution frameworks through in-memory operations combined with fast execution pipeline. This work also proposes different deployment architectures while utilizing Lustre as underlying storage and provides fast shuffle strategies with dynamic adjustments. The priority based storage selection for intermediate data storage ensures the best storage usage at any point of job execution. This work also presents a variant of HOMR, that can exploit the byte-addressability of NVM to provide fast execution of MapReduce applications. Finally, a generalized advising framework is presented in this work that can provide optimum configuration recommendations for any MapReduce system with profiling and prediction capabilities. Through performance modeling of this MapReduce execution framework, techniques of predicting job execution performance are demonstrated on leadership-class HPC clusters at large scale.

Committee:

Dhabaleswar Panda (Advisor); Ponnuswamy Sadayappan (Committee Member); Radu Teodorescu (Committee Member)

Subjects:

Computer Engineering; Computer Science

Keywords:

Designing and Modeling, High-Performance MapReduce, DAG Execution Framework, Modern HPC Systems

Depuru, Soma ShekaraModeling, Detection, and Prevention of Electricity Theft for Enhanced Performance and Security of Power Grid
Doctor of Philosophy in Engineering, University of Toledo, 2012, College of Engineering
This dissertation contributes to the development and implementation of novel algorithms for analyzing the electricity consumption patterns of customers and identifying illegal consumers based on irregularities in consumption. Distribution of electricity involves significant Technical as well as Non-Technical Losses (NTL). Illegal consumption of electricity or electricity theft constitutes a major share of NTL. This dissertation discusses several methods implemented by illegal consumers for stealing electricity and provides relevant literature review. A comprehensive review of the advantages, challenges and technologies involved in the design, development, and deployment of smart meters is presented. With the advent of advanced metering technologies, real-time energy consumption data will be available at the utilities end, which can be used to detect illegal consumers. This dissertation presents an encoding technique that simplifies the received customer energy consumption readings (patterns) and maps them into corresponding irregularities in consumption. The encoding technique preserves the exclusivity in the energy consumption patterns. The encoding technique saves significant CPU time in the real-time analysis and classification of customers, in addition to decreasing the memory required to store historical data. Then, this dissertation elucidates operation of intelligent classification techniques on customer energy consumption data to classify genuine and illegal consumers. These classification models are applied on regular energy consumption data as well as the encoded data to compare corresponding classification accuracies and computational overhead. Further, performance and scope of the proposed algorithms is enhanced in two directions - reducing the overall computation time, and including more real-time parameters using High Performance Computers (HPC). The encoding and classification algorithms are parallelized (in both Task Parallel and Data Parallel approaches). On the other hand, impact of Time-Based Pricing (TBP) and Distributed Generation (DG) on illegal consumers as well as the algorithms used for detection of illegal consumers are analyzed. Economics involved in terms of losses due to illegal consumption of electricity is also explained.

Committee:

Lingfeng Wang, PhD (Committee Chair); Vijay Devabhaktuni, PhD (Committee Co-Chair); Mansoor Alam, PhD (Committee Member); Mohamed Samir Hefzy, PhD (Committee Member); Roger King, PhD (Committee Member)

Subjects:

Electrical Engineering

Keywords:

Non-Technical Losses; Detection of Electricity Theft; Neural Network; SVM; Data Classification; Rule Engine; Smart Grid; Smart Meter; Power Grid; Fraud/Illegal consumption of electricity; HPC; DG;

Jamaliannasrabadi, SabaHigh Performance Computing as a Service in the Cloud Using Software-Defined Networking
Master of Science (MS), Bowling Green State University, 2015, Computer Science
Benefits of Cloud Computing (CC) such as scalability, reliability, and resource pooling have attracted scientists to deploy their High Performance Computing (HPC) applications on the Cloud. Nevertheless, HPC applications can face serious challenges on the cloud that could undermine the gained benefit, if care is not taken. This thesis targets to address the shortcomings of the Cloud for the HPC applications through a platform called HPC as a Service (HPCaaS). Further, a novel scheme is introduced to improve the performance of HPC task scheduling on the Cloud using the emerging technology of Software-Defined Networking (SDN). The research introduces “ASETS: A SDN-Empowered Task Scheduling System” as an elastic platform for scheduling HPC tasks on the cloud. In addition, a novel algorithm called SETSA is developed as part of the ASETS architecture to manage the scheduling task of the HPCaaS platform. The platform monitors the network bandwidths to take advantage of the changes when submitting tasks to the virtual machines. The experiments and benchmarking of HPC applications on the Cloud identified the virtualization overhead, cloud networking, and cloud multi-tenancy as the primary shortcomings of the cloud for HPC applications. A private Cloud Test Bed (CTB) was set up to evaluate the capabilities of ASETS and SETSA in addressing such problems. Subsequently, Amazon AWS public cloud was used to assess the scalability of the proposed systems. The obtained results of ASETS and SETSA on both private and public cloud indicate significant performance improvement of HPC applications can be achieved. Furthermore, the results suggest that proposed system is beneficial both to the cloud service providers and the users since ASETS performs better the degree of multi-tenancy increases. The thesis also proposes SETSAW (SETSA Window) as an improved version of SETSA algorism. Unlike other proposed solutions for HPCaaS which have either optimized the cloud to make it more HPC-friendly, or required adjusting HPC applications to make them more cloud-friendly, ASETS tends to provide a platform for existing cloud systems to improve the performance of HPC applications.

Committee:

Hassan Rajaei, Ph.D (Advisor); Robert Green, Ph.D (Committee Member); Jong Kwan Lee, Ph.D (Committee Member)

Subjects:

Computer Engineering; Computer Science; Technology

Keywords:

High Performance Computing; HPC; Cloud Computing; Scientific Computing; HPCaaS; Software Defined Networking; SDN; Cloud Networking; Virtualization