Search Results (1 - 8 of 8 Results)

Sort By  
Sort Dir
 
Results per page  

Wang, KaiboAlgorithmic and Software System Support to Accelerate Data Processing in CPU-GPU Hybrid Computing Environments
Doctor of Philosophy, The Ohio State University, 2015, Computer Science and Engineering
Massively data-parallel processors, Graphics Processing Units (GPUs) in particular, have recently entered the main stream of general-purpose computing as powerful hardware accelerators to a large scope of applications including databases, medical informatics, and big data analytics. However, despite their performance benefit and cost effectiveness, the utilization of GPUs in production systems still remains limited. A major reason behind this situation is the slow development of supportive GPU software ecosystem. More specially, (1) CPU-optimized algorithms for some critical computation problems have irregular memory access patterns with intensive control flows, which cannot be easily ported to GPUs to take full advantage of its fine-grained, massively data-parallel architecture; (2) commodity computing environments are inherently concurrent and require coordinated resource sharing to maximize throughput, while existing systems are still mainly designed for dedicated usage of GPU resources. In this Ph.D. dissertation, we develop efficient software solutions to support the adoption of massively data-parallel processors in general-purpose commodity computing systems. Our research mainly focuses on the following areas. First, to make a strong case for GPUs as indispensable accelerators, we apply GPUs to significantly improve the performance of spatial data cross-comparison in digital pathology analysis. Instead of trying to port existing CPU-based algorithms to GPUs, we design a new algorithm and fully optimize it to utilize GPU’s hardware architecture for high performance. Second, we propose operating system support for automatic device memory management to improve the usability and performance of GPUs in shared general-purpose computing environments. Several effective optimization techniques are employed to ensure the efficient usage of GPU device memory space and to achieve high throughput. Finally, we develop resource management facilities in GPU database systems to support concurrent analytical query processing. By allowing multiple queries to execute simultaneously, the resource utilization of GPUs can be greatly improved. It also enables GPU databases to be utilized in important application areas where multiple user queries need to make continuous progresses simultaneously.

Committee:

Xiaodong Zhang (Advisor); P. Sadayappan (Committee Member); Christopher Stewart (Committee Member); Harald Vaessin (Committee Member)

Subjects:

Computer Engineering; Computer Science

Keywords:

GPUs, Memory Management, Operating Systems, GPU Databases, Resource Management, Digital Pathology

Gideon, JohnThe Integration of LlamaOS for Fine-Grained Parallel Simulation
MS, University of Cincinnati, 2013, Engineering and Applied Science: Computer Engineering
LlamaOS is a custom operating system that provides much of the basic functionality needed for low latency applications. It is designed to run in a Xen-based virtual machine on a Beowulf cluster of multi/many-core processors. The software architecture of llamaOS is decomposed into two main components, namely: the llamaNET driver and llamaApps. The llamaNET driver contains Ethernet drivers and manages all node-to-node communications between user application programs that are contained within a llamaApp instance. Typically, each node of the Beowulf cluster will run one instance of the llamaNET driver with one or more llamaApps bound to parallel applicaitons. These capabilities provide a solid foundation for the deployment of MPI applications as evidenced by our initial benchmarks and case studies. However, a message passing standard still needed to be either ported or implemented in llamaOS. To minimize latency, llamaMPI was developed as a new implementation of the Message Passing Interface (MPI), which is compliant with the core MPI functionality. This provides a standardized and easy way to develop for this new system. Performance assessment of llamaMPI was achieved using both standard parallel computing benchmarks and a locally (but independently) developed program that executes parallel discrete event-driven simulations. In particular, the NAS Parallel Benchmarks are used to show the performance characteristics of llamaMPI. In the experiments, most of the NAS Parallel Benchmarks ran faster than, or equal to their native performance. The benefit of llamaMPI was also shown with the fine-grained parallel application WARPED. The order of magnitude lower communication latency in llamaMPI greatly reduced the amount of time that the simulation spent in rollbacks. This resulted in an overall faster and more efficient computation, because less time was spent off the critical path due to causality errors.

Committee:

Philip Wilsey, Ph.D. (Committee Chair); Fred Beyette, Ph.D. (Committee Member); Carla Purdy, Ph.D. (Committee Member)

Subjects:

Computer Engineering

Keywords:

Parallel Computing;Time Warp Simulation;MPI;Operating Systems;Beowulf Cluster;Parallel Discrete Event Simulation

RAMAN, VENKATESHA STUDY OF CLUSTER PAGING METHODS TO BOOST VIRTUAL MEMORY PERFORMANCE
MS, University of Cincinnati, 2002, Engineering : Computer Engineering
With increase in CPU speed, the performance of many applications has become limited by Virtual Memory (VM). In the case of large out-of-core applications, a portion of their address space is kept in main memory and the rest is stored on a swap disk. The Operating System maintains free pages in RAM and uses the swap disk as an extension of main memory. It dynamically moves inactive pages to disk and maintains working sets of processes in RAM. This activity is called VM paging. Thus, the VM system is limited by disk performance. Disk access speeds are orders of magnitude slower than CPU or main memory. Improving disk Input/Output (I/O) performance is a major challenge because disks have mechanical delays that cannot be reduced. Many researchers have tried to improve disk performance by combining several disk writes into one large operation. Many new I/O architectures such as DCD, RAPID cache, and LFS focus on improving disk write performance. However, VM I/O is read-dominated and generates a substantial number of read requests. The performance of VM cannot be improved by using a large read cache because VM itself is a cache for pages on disk and so an additional cache decreases the amount of memory available to the VM system. Thus, improvements in VM paging activity are required in order to improve the performance of the VM system. In this research, we study a Cluster Paging method that helps boost both read and write performance. It converts several small disk I/O operations into one large request. By using a simple grouping technique, the Cluster Paging method groups pages with good locality into a cluster and stores it on disk. This method uses a cluster as the basic unit of an I/O operation. When a page is read, the entire cluster that has the required page is read into memory. Thus, Cluster Paging prefetches pages into memory before they are accessed. Due to large disk I/O operations and effective prefetching, Cluster Paging can achieve good improvement in VM I/O performance. We study the performance of Cluster Paging using large out-of-core benchmarks. We find that it improves the performance of SPEC 2000 benchmarks by more than 70%. It also improves the performance of NASA Parallel Benchmarks by more than 97%. These results are encouraging and we expect that VM performance could be further improved by using Translation Lookaside Buffer (TLB) and hardware support

Committee:

Dr. Yiming Hu (Advisor)

Keywords:

virtual memory; page fault; cluster paging; swap cache; operating systems

Musunuru, Venkata Krishna KanthVirtuo-ITS: An Interactive Tutoring System to Teach Virtual Memory Concepts of an Operating System
Master of Science (MS), Wright State University, 2017, Computer Science
Interactive tutoring systems are software applications that help individuals to learn difficult concepts. They can allow students to interact with ideas from essential mathematics to more complicated subjects like software engineering. This thesis concentrates on one such interactive tutoring system (ITS) designed for teaching concepts related to operating system virtual memory. Operating system concepts can be troublesome to learn without having someone or something to explain them. Even when an instructor is able to provide detailed explanations, it is still exceptionally difficult for students without a computer science foundation to comprehend these concepts. Students require a sophisticated set of mental models to comprehend how various components of the operating system work together. In a lecture, students may find it hard to imagine the various operating system processes or how they work. A tutoring system that visually shows these concepts to students and lets them interact with models of the various components can make learning much easier. This thesis discusses such an ITS called Virtuo-ITS. The aim of this ITS is to aid individuals in learning virtual memory concepts like paging and virtual address to physical address translation. Virtuo-ITS visually explains concepts of virtual memory and provides tasks for learners to test their understanding of the concepts. An individual can interact with the system and control the virtual memory processes that are happening to develop a better mental model of each of the concepts. This fulfills the principle point of an ITS, which is to teach difficult concepts simply.

Committee:

Adam R. Bryant, Ph.D. (Advisor); Michelle A. Cheatham, Ph.D. (Committee Member); Mateen M. Rizki, Ph.D. (Committee Member)

Subjects:

Artificial Intelligence; Computer Science; Educational Software; Information Technology

Keywords:

Interactive tutoring systems; ITS, Virtual memory visualization; Paging; Address transformation; Operating systems

Zheng, MaiTowards Manifesting Reliability Issues In Modern Computer Systems
Doctor of Philosophy, The Ohio State University, 2015, Computer Science and Engineering
Computer Systems are evolving all the time. Particularly, the two most fundamental components, i.e., the compute unit and the storage unit, have witnessed dramatic changes in recent years. For example, on the compute side, the graphics processing units (GPUs) have emerged as an extremely cost-effective means for achieving high performance computing. Similarly, on the storage side, flash-based solid-state drives (SSDs) are revolutionizing the whole IT industry. While these new technologies have improved the performance of computer systems to a new level, they also bring new challenges to the reliability of the systems. As a new computing platform, GPUs enforce a novel multi-threaded programming model. Like any multi-threaded environment, data races on GPUs can severely affect the correctness of the applications and may lead to data loss or corruption. Similarly, as a new storage medium, SSDs also bring potential reliability challenges to the already complicated storage stack. Among other things, the behavior of SSDs during power faults — which happen even in the leading data centers — is an important yet mostly ignored issue in this dependability-critical area. Besides SSDs, another important layer in modern storage stack is databases. The atomicity, consistency, isolation, and durability (ACID) properties modern databases provide make it easy for application developers to create highly reliable applications. However, the ACID properties are far from trivial to provide, particularly when high performance must be achieved. This leads to complex and error-prone code—even at a low defect rate of one bug per thousand lines, the millions of lines of code in a commercial OLTP database can harbor thousands of bugs. As the first step towards building robust modern computer systems, this dissertation proposes novel approaches to detect and manifest the reliability issues in three different layers of computer systems. First, in the application layer, this dissertation proposes a low-overhead method for detecting races in GPU applications. The method combines static analysis with a carefully designed dynamic checker for logging and analyzing information at runtime. The design utilizes GPUs memory hierarchy to log runtime data accesses efficiently. To improve the performance, we leverage static analysis to reduce the number of statements that need to be instrumented. Additionally, by exploiting the knowledge of thread scheduling and the execution model in the underlying GPUs, our approach can accurately detect data races with no false positives reported. Our experimental results show that comparing to previous approaches, our method is more effective in detecting races in the evaluated cases, and incurs much less runtime and space overhead. Second, in the device layer, this dissertation proposes an effective framework to expose reliability issues in SSDs under power faults. The framework includes speciallydesigned hardware to inject power faults directly to devices, workloads to stress storage components, and techniques to detect various types of failures. Applying our testing framework, we test fifteen commodity SSDs from five different vendors using more than three thousand fault injection cycles in total. Our experimental results reveal that thirteen out of the fifteen tested SSD devices exhibit surprising failure behaviors under power faults, including bit corruption, shorn writes, unserializable writes, metadata corruption, and total device failure. Third, in the systems software layer, this dissertation proposes a novel recordand-replay framework to expose and diagnose violations of the ACID properties in modern databases. The framework includes workloads to exercise the ACID guarantees, a record-and-replay subsystem to allow the controlled injection of simulated power faults, a ranking algorithm to prioritize where to fault based on our experience, and a multi-layer tracer to diagnose root causes. Using the framework, we study 8 widely-used databases, ranging from open-source key-value stores to high-end commercial OLTP servers. Surprisingly, all 8 databases exhibit erroneous behavior. For the open-source databases, we are able to diagnose the root causes using our tracer, and for the proprietary commercial databases we can reproducibly induce data loss.

Committee:

Feng Qin (Advisor); Gagan Agrawal (Committee Member); Xiaodong Zhang (Committee Member)

Subjects:

Computer Science

Keywords:

computer systems, reliability, storage systems, solid-state drives, SSD, databases, graphics processing units, GPU, high-performance computing, operating systems

Isaacs, DovComputer operating system facilities for the automatic control & activity scheduling of computer-based management systems /
Doctor of Philosophy, The Ohio State University, 1977, Graduate School

Committee:

Not Provided (Other)

Subjects:

Computer Science

Keywords:

Management information systems;Operating systems

Isaacs, DovComputer operating system facilities for the automatic control & activity scheduling of computer-based management systems
Doctor of Philosophy, The Ohio State University, 1977, Computer and Information Science

Committee:

Clinton Foulk (Advisor)

Keywords:

CBMS; PERT; Operating Systems; Data Processing; processing; Control Procedures

KUNAPULI, UDAYKUMARA STUDY OF SWAP CACHE BASED PREFETCHING TO IMPROVE VITUAL MEMORY PERFORMANCE
MS, University of Cincinnati, 2002, Engineering : Computer Engineering
With dramatic increase in processor speeds over the last decade, disk latency has become a critical issue in computer systems performance. Disks, being mechanical devices, are orders of magnitude slower than the processor or physical memory. Most Virtual Memory(VM) systems use disk as secondary storage for idle data pages of an application. The working set of pages is kept in memory. When a page requested by the processor is not present in memory, it results in a page fault. On a page fault, the Operating System brings the requested page from the disk into memory. Thus the performance of Virtual Memory systems depends on disk performance. In this project, we aim to reduce the effect of disks on Virtual Memory performance compared to the traditional demand paging system. We study novel techniques of page grouping and prefetching to improve Virtual Memory system performance. We group pages, evicted from memory at about the same time, into a single large block. On a page fault, we prefetch the entire block along with the faulting page. We implement this grouping and prefetching scheme with a swap cache. The swap cache combines a group of pages, evicted from memory, into a superblock. Superblock is the basic unit of I/O operation during paging and swapping. During a disk read, the entire superblock that has the required page is read from the disk directly into memory. We prefetch all pages with memory eviction locality in a single disk read. From this study, we find that swap cache based prefetching significantly reduces the number of read accesses to the disk. Our simulations show that the number of read accesses to the disk reduced by at least 12% for all the six SPEC 2000 benchmark applications used in this study. For some applications, the number of read accesses reduced by as much as 90%. We also find improvement in Virtual Memory I/O performance of many SPEC 2000 benchmark applications. With the swap cache, Virtual Memory performance of five of the six SPEC 2000 benchmark applications improved by at least 25%, with some improving up to 88%.

Committee:

Dr.Yiming Hu (Advisor)

Keywords:

virtual memory; page fault; swap cache; prediction; operating systems