Search Results (1 - 6 of 6 Results)

Sort By  
Sort Dir
 
Results per page  

Su, YuBig Data Management Framework based on Virtualization and Bitmap Data Summarization
Doctor of Philosophy, The Ohio State University, 2015, Computer Science and Engineering
In recent years, science has become increasingly data driven. Data collected from instruments and simulations is extremely valuable for a variety of scientific endeavors. The key challenge being faced by these efforts is that the dataset sizes continue to grow rapidly. With growing computational capabilities of parallel machines, temporal and spatial scales of simulations are becoming increasingly fine-grained. However, the data transfer bandwidths and disk IO speed are growing at a much slower pace, making it extremely hard for scientists to transport these rapidly growing datasets. Our overall goal is to provide a virtualization and bitmap based data management framework for “big data” applications. The challenges rise from four aspects. First, the “big data” problem leads to a strong requirement for efficient but light-weight server-side data subsetting and aggregation to decrease the data loading and transfer volume and help scientists find subsets of the data that is of interest to them. Second, data sampling, which focuses on selecting a small set of samples to represent the entire dataset, is able to greatly decrease the data processing volume and improve the efficiency. However, finding a sample with enough accuracy to preserve scientific data features is difficult, and estimating sampling accuracy is also time-consuming. Third, correlation analysis over multiple variables plays a very important role in scientific discovery. However, scanning through multiple variables for correlation calculation is extremely time-consuming. Finally, because of the huge gap between computing and storage, a big amount of time for data analysis is wasted on IO. In an in-situ environment, before the data is written to the disk, how to generate a smaller profile of the data to represent the original dataset and still support different analyses is very difficult. In our work, we proposed a data management framework to support more efficient scientific data analysis, which contains two modules: SQL-based Data Virtualization and Bitmap-based Data Summarization. SQL-based Data Virtualization module supports high-level SQL-like queries over different kinds of low-level data formats such as NetCDF and HDF5. From the scientists’ perspective, all they need to know is how to use SQL queries to specify their data subsetting, aggregation, sampling or even correlation analysis requirements. And our module can automatically transfer the high-level SQL queries into low-level data access languages, fetch the data subsets, perform different calculations and return the final results to the scientists. Bitmap-based Data Summarization module treats bitmap index as a data summarization and supports different kinds of analysis only using bitmaps. Indexing technology, especially bitmap indexing have been widely used in database area to improve the data query efficiency. The major contribution of our work is that we find bitmap index keeps both value distribution and spatial locality of the scientific dataset. Hence, it can be treated as a summarization of the data with much smaller size. We demonstrate that many different kinds of analyses can be supported only using bitmaps.

Committee:

Gagan Agrawal (Advisor)

Subjects:

Computer Science

Keywords:

Big Data; High-Performance Computing; Bitmap Index; Data Virtualization; Sampling; Correlation Analysis; Time Steps Selection; In-Situ Analysis; Distributed Computing; Scientific Data Management; Wide-area Data Transfer

Lancaster, RobertLow Latency Networking in Virtualized Environments
MS, University of Cincinnati, 2012, Engineering and Applied Science: Computer Engineering

Beowulf clusters are popular in the field of high performance computing (HPC). Customized operating systems have been used to achieve speedup in HPC by providing specific mechanisms to support the application and by eliminating OS jitter. Virtualized operating systems make it possible to run customized operating systems in a shared environment. The principle draw back to virtualized operating systems for HPC is the added I/O latency of virtualization.

Para-virtualized I/O when coupled with a lightweight protocol. Can serve to reduce and in many cases eliminate the latency gap between native network I/O and virtualized network I/O. This study finds the latency performance of para-virtualized Infiniband over Ethernet matches or exceeds the performance of TCP/IP native for messages over 128 bytes.

Committee:

Philip Wilsey, PhD (Committee Chair); Fred Beyette, PhD (Committee Member); Wen Ben Jone, PhD (Committee Member)

Subjects:

Computer Engineering

Keywords:

Virtalization; low latency; Active Messages; para-virtualization; Infiniband; IBoE

Balaraman, SubhaBill Share - Capacity Planning and Management
Master of Science, The Ohio State University, 2012, Computer Science and Engineering

Management of the information technology of today's enterprise must address the twofold challenge of meeting customer expectations on one hand while staying competitive by controlling IT costs on the other hand. Round-the-clock performance, availability and security are the crucial indicators of quality of service for mission critical web applications. Tradeoffs are necessary to accommodate hardware limitations, quality issues and budget concerns. Capacity planning techniques are essential for ensuring the quality of web services within budget. In particular, with multi-tier architectures becoming industry standards, it is important to design effective and accurate performance prediction models under an enterprise production environment with a real workload mix.

The primary objective of the research described in this thesis is to understand the techniques for capacity planning within an enterprise organization when a significant new service is integrated. We first propose a new online service component named Bill Share and then describe how this service integrates with other existing web services of the enterprise. To provide a new service that results in an increase in sales resulting in further profits. We then perform capacity modeling with different hardware models and evaluate the performance metrics against a similar bill tracking web application. We then build an application prototype to validate response times, availability and server utilization's predicted by our model.

Committee:

Rajiv Ramnath, PhD (Advisor); Gagan Agrawal, PhD (Committee Chair); Jayashree Ramanathan, PhD (Committee Member)

Subjects:

Computer Engineering; Computer Science

Keywords:

capacity planning; virtualization; capacity management; bill sharing; loadrunner; capacity modeling

Craig, KyleExploration and Integration of File Systems in LlamaOS
MS, University of Cincinnati, 2014, Engineering and Applied Science: Computer Engineering
LlamaOS is a minimalist and customizable operating system for Xen-based virtual machines running on parallel cluster hardware. With the addition of LlamaMPI, a custom implementation of the standard Message Passing Interface (MPI), LlamaOS is able to achieve significant performance gains for fine-grained parallel applications. However, the initial setup of LlamaOS lacked full file system support which limited its use to only a limited set of applications. Initially LlamaOS could only support a single read-only file for each instance in the parallel virtual machine. The original design of LlamaOS was motivated by work in parallel discrete-event simulation. However, the single-file file system of LlamaOS had significant drawbacks. For some simulation models, multiple configuration files are needed to fully run properly and to make these models run on LlamaOS the parallel simulator that was being evaluated had to be changed so that only a single configuration file was needed. This went against one of the base principles of LlamaOS: Outside applications can be run with little to no modification. Another major drawback was that the models could not create and write their own results file limiting the amount of data that could be gathered from running in LlamaOS. In order to alleviate these issues this thesis explores the implementation of a bare-bones Virtual File System (VFS) and, consequently, support for a Second Extended File System (Ext2) driver has been added to LlamaOS. In order to test the functionality and performance of the new LlamaOS file system, the bonnie benchmark (a common tool used for benchmarking various types of file I/O at a basic system level) was ported to run within LlamaOS. The benchmark showed promising results, and verified that the file system implementation was functional and able to perform file reads, file writes, and file creation. In addition to the bonnie benchmark a couple of parallel simulation models were also used to verify that the file system implementation achieved its goal of efficiently supporting parallel simulation within LlamaOS with little to no modification to the original simulation kernel.

Committee:

Philip Wilsey, Ph.D. (Committee Chair); Fred Beyette, Ph.D. (Committee Member); Karen Davis, Ph.D. (Committee Member)

Subjects:

Computer Engineering

Keywords:

File Systems;Ext2;Virtualization;LlamaOS;WARPED;PDES

Jamaliannasrabadi, SabaHigh Performance Computing as a Service in the Cloud Using Software-Defined Networking
Master of Science (MS), Bowling Green State University, 2015, Computer Science
Benefits of Cloud Computing (CC) such as scalability, reliability, and resource pooling have attracted scientists to deploy their High Performance Computing (HPC) applications on the Cloud. Nevertheless, HPC applications can face serious challenges on the cloud that could undermine the gained benefit, if care is not taken. This thesis targets to address the shortcomings of the Cloud for the HPC applications through a platform called HPC as a Service (HPCaaS). Further, a novel scheme is introduced to improve the performance of HPC task scheduling on the Cloud using the emerging technology of Software-Defined Networking (SDN). The research introduces “ASETS: A SDN-Empowered Task Scheduling System” as an elastic platform for scheduling HPC tasks on the cloud. In addition, a novel algorithm called SETSA is developed as part of the ASETS architecture to manage the scheduling task of the HPCaaS platform. The platform monitors the network bandwidths to take advantage of the changes when submitting tasks to the virtual machines. The experiments and benchmarking of HPC applications on the Cloud identified the virtualization overhead, cloud networking, and cloud multi-tenancy as the primary shortcomings of the cloud for HPC applications. A private Cloud Test Bed (CTB) was set up to evaluate the capabilities of ASETS and SETSA in addressing such problems. Subsequently, Amazon AWS public cloud was used to assess the scalability of the proposed systems. The obtained results of ASETS and SETSA on both private and public cloud indicate significant performance improvement of HPC applications can be achieved. Furthermore, the results suggest that proposed system is beneficial both to the cloud service providers and the users since ASETS performs better the degree of multi-tenancy increases. The thesis also proposes SETSAW (SETSA Window) as an improved version of SETSA algorism. Unlike other proposed solutions for HPCaaS which have either optimized the cloud to make it more HPC-friendly, or required adjusting HPC applications to make them more cloud-friendly, ASETS tends to provide a platform for existing cloud systems to improve the performance of HPC applications.

Committee:

Hassan Rajaei, Ph.D (Advisor); Robert Green, Ph.D (Committee Member); Jong Kwan Lee, Ph.D (Committee Member)

Subjects:

Computer Engineering; Computer Science; Technology

Keywords:

High Performance Computing; HPC; Cloud Computing; Scientific Computing; HPCaaS; Software Defined Networking; SDN; Cloud Networking; Virtualization

Patali, RohitUtility-Directed Resource Allocation in Virtual Desktop Clouds
Master of Science, The Ohio State University, 2011, Computer Science and Engineering

User communities are rapidly transitioning their "traditional desktops" that have dedicated hardware and software installations into "virtual desktop clouds" (VDCs) that are accessible via thin-clients. To allocate and manage VDC resources for Internet-scale desktop delivery, existing work focuses mainly on managing server-side resources based on utility functions of CPU and memory loads, and do not consider network health and thin-client user experience. Resource allocations without combined utility-directed information of system loads, network health and thin-client user experience in VDC platforms inevitably results in costly guesswork and over-provisioning of resources.

In this thesis, an analytical model i.e., "Utility-Directed Resource Allocation Model (U-RAM)" is presented to solve the combined utility-directed resource allocation problem within VDCs. The solution uses an iterative algorithm that leverages utility functions of system, network and human components obtained using a novel virtual desktop performance benchmarking toolkit i.e., "VDBench". The combined utility functions are used to direct decision schemes based on Kuhn-Tucker optimality conditions for creating user desktop pools and determining optimal resource allocation size/location. U-RAM is evaluated in a VDC testbed featuring: (a) popular user applications (Spreadsheet Calculator, Internet Browser, Media Player, Interactive Visualization), and (b) TCP/UDP based thin-client protocols (RDP, RGS, PCoIP) under a variety of user load and network health conditions. Evaluation results demonstrate that U-RAM solution maximizes VDC scalability i.e., 'VDs per core density', and 'user connections quantity', while delivering satisfactory thin-client user experience.

Committee:

Rajiv Ramnath (Advisor); Prasad Calyam (Committee Member); Gagan Agrawal (Committee Member)

Subjects:

Computer Engineering; Computer Science

Keywords:

Virtual Destop Cloud; Desktop Virtualization; Utility; Cloud; Thin Client; Scalability; Performance