Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 12)

Mini-Tools

 
 

Search Report

  • 1. Koch, Johnathan Applying Computational Resources to the Down-Arrow Problem

    Master of Science in Mathematics, Youngstown State University, 2023, Department of Mathematics and Statistics

    A graph G is said to arrow a graph H if every red-blue edge coloring of G presents a monochromatic H, and is written G→H. The down-arrow Ramsey set reports all subgraphs H of a graph G for which G→H. Formally, the down-arrow Ramsey set is a graph G is ↓G:= {H⊆G: G→H }. Calculating this set by way of scientific computing is computationally prohibitive with the resources commonly available to graph theorists and other academics. Using existing research into complete graphs, the down-arrow Ramsey sets for small complete graphs (Kn for 2 ≤ n ≤ 7) can be generated quickly. For larger complete graphs (Kn for 8 ≤ n ≤ 11) specific pre-processing steps are leveraged to speed up calculations in addition to existing data sets. Presented is work on the development of a Python script to generate the down-arrow Ramsey set of a graph through efficient memory management and parallel computing methodologies. The down-arrow generator is used to report new results on complete graphs as well as complete bipartite graphs, and assorted other graphs.

    Committee: Alexis Byers PhD (Advisor); Alina Lazar PhD (Committee Member); Anita O'Mellan PhD (Committee Member) Subjects: Computer Science; Mathematics
  • 2. Jamaliannasrabadi, Saba High Performance Computing as a Service in the Cloud Using Software-Defined Networking

    Master of Science (MS), Bowling Green State University, 2015, Computer Science

    Benefits of Cloud Computing (CC) such as scalability, reliability, and resource pooling have attracted scientists to deploy their High Performance Computing (HPC) applications on the Cloud. Nevertheless, HPC applications can face serious challenges on the cloud that could undermine the gained benefit, if care is not taken. This thesis targets to address the shortcomings of the Cloud for the HPC applications through a platform called HPC as a Service (HPCaaS). Further, a novel scheme is introduced to improve the performance of HPC task scheduling on the Cloud using the emerging technology of Software-Defined Networking (SDN). The research introduces “ASETS: A SDN-Empowered Task Scheduling System” as an elastic platform for scheduling HPC tasks on the cloud. In addition, a novel algorithm called SETSA is developed as part of the ASETS architecture to manage the scheduling task of the HPCaaS platform. The platform monitors the network bandwidths to take advantage of the changes when submitting tasks to the virtual machines. The experiments and benchmarking of HPC applications on the Cloud identified the virtualization overhead, cloud networking, and cloud multi-tenancy as the primary shortcomings of the cloud for HPC applications. A private Cloud Test Bed (CTB) was set up to evaluate the capabilities of ASETS and SETSA in addressing such problems. Subsequently, Amazon AWS public cloud was used to assess the scalability of the proposed systems. The obtained results of ASETS and SETSA on both private and public cloud indicate significant performance improvement of HPC applications can be achieved. Furthermore, the results suggest that proposed system is beneficial both to the cloud service providers and the users since ASETS performs better the degree of multi-tenancy increases. The thesis also proposes SETSAW (SETSA Window) as an improved version of SETSA algorism. Unlike other proposed solutions for HPCaaS which have either optimized the cloud to make it more HPC-fr (open full item for complete abstract)

    Committee: Hassan Rajaei Ph.D (Advisor); Robert Green Ph.D (Committee Member); Jong Kwan Lee Ph.D (Committee Member) Subjects: Computer Engineering; Computer Science; Technology
  • 3. Su, Yu Big Data Management Framework based on Virtualization and Bitmap Data Summarization

    Doctor of Philosophy, The Ohio State University, 2015, Computer Science and Engineering

    In recent years, science has become increasingly data driven. Data collected from instruments and simulations is extremely valuable for a variety of scientific endeavors. The key challenge being faced by these efforts is that the dataset sizes continue to grow rapidly. With growing computational capabilities of parallel machines, temporal and spatial scales of simulations are becoming increasingly fine-grained. However, the data transfer bandwidths and disk IO speed are growing at a much slower pace, making it extremely hard for scientists to transport these rapidly growing datasets. Our overall goal is to provide a virtualization and bitmap based data management framework for “big data” applications. The challenges rise from four aspects. First, the “big data” problem leads to a strong requirement for efficient but light-weight server-side data subsetting and aggregation to decrease the data loading and transfer volume and help scientists find subsets of the data that is of interest to them. Second, data sampling, which focuses on selecting a small set of samples to represent the entire dataset, is able to greatly decrease the data processing volume and improve the efficiency. However, finding a sample with enough accuracy to preserve scientific data features is difficult, and estimating sampling accuracy is also time-consuming. Third, correlation analysis over multiple variables plays a very important role in scientific discovery. However, scanning through multiple variables for correlation calculation is extremely time-consuming. Finally, because of the huge gap between computing and storage, a big amount of time for data analysis is wasted on IO. In an in-situ environment, before the data is written to the disk, how to generate a smaller profile of the data to represent the original dataset and still support different analyses is very difficult. In our work, we proposed a data management framework to support more efficient scientific data analysis, which (open full item for complete abstract)

    Committee: Gagan Agrawal (Advisor) Subjects: Computer Science
  • 4. Bicer, Tekin Supporting Data-Intensive Scienti c Computing on Bandwidth and Space Constrained Environments

    Doctor of Philosophy, The Ohio State University, 2014, Computer Science and Engineering

    Scientific applications, simulations and instruments generate massive amount of data. This data does not only contribute to the already existing scientific areas, but it also leads to new sciences. However, management of this large-scale data and its analysis are both challenging processes. In this context, we require tools, methods and technologies such as reduction-based processing structures, cloud computing and storage, and efficient parallel compression methods. In this dissertation, we first focus on parallel and scalable processing of data stored in S3, a cloud storage resource, using compute instances in Amazon Web Services (AWS). We develop MATE-EC2 which allows specification of data processing using a variant of Map-Reduce paradigm. We show various optimizations, including data organization, job scheduling, and data retrieval strategies, that can be leveraged based on the performance characteristics of cloud storage resources. Furthermore, we investigate the efficiency of our middleware in both homogeneous and heterogeneous environments. Next, we improve our middleware so that users can perform transparent processing on data that is distributed among local and cloud resources. With this work, we maximize the utilization of geographically distributed resources. We evaluate our system's overhead, scalability, and performance with varying data distributions. The users of data-intensive applications have different requirements on hybrid cloud settings. Two of the most important ones are execution time of the application and resulting cost on the cloud. Our third contribution is providing a time and cost model for data-intensive applications that run on hybrid cloud environments. The proposed model lets our middleware adapt performance changes and dynamically allocate necessary resources from its environments. Therefore, applications can meet user specified constraints. Fourth, we investigate compression approaches for scientific datasets and bui (open full item for complete abstract)

    Committee: Gagan Agrawal (Advisor); Feng Qin (Committee Member); Spyros Blanas (Committee Member) Subjects: Computer Science
  • 5. Bas, Erdeniz Load-Balancing Spatially Located Computations using Rectangular Partitions

    Master of Science, The Ohio State University, 2011, Computer Science and Engineering

    Distributing spatially located heterogeneous workloads is an important problem in parallel scientific computing. Particle-in-cell simulators, ray tracing and partial differential equations are some of the applications with spatially located workload. We investigate the problem of partitioning such workloads (represented as a matrix of non-negative integers) into rectangles, such that the load of the most loaded rectangle (processor) is minimized. Since finding the optimal arbitrary rectangle-based partition is an NP-hard problem, we investigate particular classes of solutions: rectilinear, jagged and hierarchical. We present a new class of solutions called m-way jagged partitions, propose new optimal algorithms for m-way jagged partitions and hierarchical partitions, propose new heuristic algorithms, and provide worst case performance analyses for some existing and new heuristics. Balancing the load does not guarantee to minimize the total runtime of an application. In order to achieve that, one must also take into account the communication cost. Rectangle shaped partitioning inherently keeps communications small, yet one should proactively minimize them. The algorithms we propose are tested in simulation on a wide set of instances and compared to state of the art algorithm. Results show that m-way jagged partitions are low in total communication cost and practical to use.

    Committee: Umit V. Catalyurek PhD (Advisor); Radu Teodorescu PhD (Committee Member) Subjects: Computer Science
  • 6. Kumar, Vijay Specification, Configuration and Execution of Data-intensive Scientific Applications

    Doctor of Philosophy, The Ohio State University, 2010, Computer Science and Engineering

    Recent advances in digital sensor technology and numerical simulations of real-world phenomena are resulting in the acquisition of unprecedented amounts of raw digital data. Terms like ‘data explosion' and ‘data tsunami' have come to describe the uncontrolled rate at which scientific datasets are generated by automated sources ranging from digital microscopes and telescopes to in-silico models simulating the complex dynamics of physical and biological processes. Scientists in various domains now have secure, affordable access to petabyte-scale observational data gathered over time, the analysis of which, is crucial to scientific discovery. The availability of commodity components have fostered the development of large distributed systems with high-performance computing resources to support the execution requirements of scientific data analysis applications. Increased levels of middleware support over the years have aimed to provide high scalability of application execution on these systems. However, the high-resolution, multi-dimensional nature of scientific datasets, and the complexity of analysis requirements present challenges to efficient application execution on such systems. Traditional brute-force analysis techniques to extract useful information from scientific datasets may no longer meet desired performance levels at extreme data scales. This thesis builds on a comprehensive study involving multi-dimensional data analysis applications at large data scales, and identifies a set of advanced factors or parameters to this class of applications which can be customized in domain-specific ways to obtain substantial improvements in performance. A useful property of these applications is their ability to operate at multiple performance levels based on a set of trade-off parameters, while providing different levels of quality-of-service (QoS) specific to the application instance. To avail the performance benefits brought about by such facto (open full item for complete abstract)

    Committee: P Sadayappan PhD (Advisor); Joel Saltz MD, PhD (Committee Member); Gagan Agrawal PhD (Committee Member); Umit Catalyurek PhD (Committee Member) Subjects: Computer Science
  • 7. Xu, Jiayi Scalable Extraction and Visualization of Scientific Features with Load-Balanced Parallelism

    Doctor of Philosophy, The Ohio State University, 2021, Computer Science and Engineering

    Extracting and visualizing features from scientific data can help scientists derive valuable insights. An extraction and visualization pipeline usually includes three steps: (1) scientific feature detection, (2) union-find for features' connected component labeling, and (3) visualization and analysis. As the scale of scientific data generated by experiments and simulations grows, it becomes a common practice to use distributed computing to handle large-scale data with data-parallelism, where data is partitioned and distributed over parallel processors. Three challenges arise for feature extraction and visualization on scientific applications. First, traditional feature detectors may not be effective and robust enough to capture features of interest across different scientific settings, because scientific features usually are highly nonlinear and recognized by domain scientists' soft knowledge. Second, existing union-find algorithms are either serial or not scalable enough to deal with extreme-scale datasets generated in the modern era. Third, existing parallel feature extraction and visualization algorithms fail to automatically reduce communication costs when optimizing the performance of processing units. This dissertation studies scalable scientific feature extraction and visualization to tackle the three challenges. First, we design human-centric interactive visual analytics based on scientists' requirements to address domain-specific feature detection and tracking. We focus on an essential problem in earth sciences: spatiotemporal analysis of viscous and gravitational fingers. Viscous and gravitational flow instabilities cause a displacement front to break up into finger-like fluids. Previously, scientists mainly detected the finger features using density thresholding, where scientists specify certain density thresholds and extract super-level sets from input density scalar fields. However, the results of density thresholding are sensitive to the select (open full item for complete abstract)

    Committee: Han-Wei Shen (Advisor); Rephael Wenge (Committee Member); Jian Chen (Committee Member) Subjects: Computer Engineering; Computer Science
  • 8. Xing, Haoyuan Optimizing array processing on complex I/O stacks using indices and data summarization

    Doctor of Philosophy, The Ohio State University, 2021, Computer Science and Engineering

    Increasingly, the ability of human beings to understand the universe and ourselves depends on our ability to obtain and process data. With an explosion of data being generated every day, efficiently storing and querying such data, usually multidimensional and can be represented using an array data model, is increasingly vital. Meanwhile, along with more and more powerful CPUs and accelerators adding into the system, most modern computing systems contain an increasingly complex I/O stack, ranging from traditional disk-based file systems to heterogeneous accelerators with individual memory spaces. Efficiently accessing such a complex I/O stack in array processing is essential to utilize the enormous computational power of modern computational platforms. One key to achieving such efficiency is identifying where the data is being generated or stored, and choosing appropriate representation and processing strategies accordingly. This dissertation focuses on optimizing array processing in such complex I/O stacks by studying these two fundamental questions: what data representation should be used, and where the data should be stored and processed. The two basic scenarios of scientific data analytics are considered one-by-one; The first half of the dissertation tackles the problem of efficiently processing array data post-hoc, presents a compact array storage for disk-based data, integrating lossless value-based indexing into it. Such integrated indices improve the value-based filtering operation performance by orders of magnitude without sacrificing storage size or accuracy. The dissertation then demonstrates how complex queries such as equal and similarity array joins can also be performed on such novel storage. The second half of the dissertation focuses on data generated by simulations on accelerators in-situ without storing the generated data. The system generates an improved bitmap representation on GPU to reduce the bandwidth bottleneck between host and accelerat (open full item for complete abstract)

    Committee: Rajiv Ramnath (Advisor); Gagan Agrawal (Advisor); Jason Blevins (Other); Yang Wang (Committee Member); Srinivasan Parthasarathy (Committee Member) Subjects: Computer Engineering; Computer Science
  • 9. Richards, Craig Development of Cyber-Technology Information for Remotely Accessing Chemistry Instrumentation

    Master of Computing and Information Systems, Youngstown State University, 2011, Department of Computer Science and Information Systems

    There exists a wide variety of technologies which allow for remote desktop access, data transfer, encryption, and worldwide communication through the Internet. These technologies, while independently solving unique problems, can be combined into a project which would resolve all of the unique problems with one single system. Youngstown State University's Chemistry Department required a high reliability unified system to provide remote access, web cam feeds, user security, and encrypted file transfer for computer equipment operating scientific instrumentation. A suitable software project solution was developed at Youngstown State University in collaboration with Zethus Software through analysis of technological resources and project requirements, and a process of software development. This thesis describes the cumulus::CyberLab project developed in order to resolve the above requirements. The cumulus::CyberLab project allows students, faculty, and scientists to remotely access millions of dollars of scientific equipment offered by our university from anywhere in the world. To best describe this project, this thesis outlines the overview of the project, work in the project, and how this project created unique software which is valuable to not only our university but also to other worldwide users.

    Committee: Graciela Perera PhD (Advisor); Allen Hunter PhD (Committee Member); John Sullins PhD (Committee Member) Subjects: Biology; Chemistry; Communication; Computer Science
  • 10. Tirukkovalur, Sravya A Global Address Space Approach to Automated Data Management for Parallel Quantum Monte Carlo Applications

    Master of Science, The Ohio State University, 2011, Computer Science and Engineering

    Quantum Monte Carlo is a large class of computer algorithms that simulate quantum systems with the idea of solving the quantum many-body problem. Typical parallel quantum Monte Carlo(QMC) applications use very large spline interpolation tables that are unmodified after initialization. Although only a small fraction of the table may be accessed by each parallel thread/process in a window of execution, the accesses are quite random. Hence current implementations of these methods typically use replicated copies of the entire interpolation table at each node of a parallel computer. This limits scalability since increasing the number of processors does not enable larger systems to be run. In this thesis, we take an automated data management approach which enables existing QMC codes to be adapted with minimal changes to significantly enhance the range of problem sizes that can be run. We primarily use Global Arrays partitioned global address space model to provide efficient distributed, shared storage and further the implementation is optimized by intelligent replication, locality, and data reuse management mechanisms. A transparent software caching mechanism is designed and built on the Global Arrays PGAS (Partitioned Global Address Space) programming model, to enable QMC codes to overcome their current memory limitations in running large-scale simulations. The new GA read-cache (GRC) has been used to enhance the scalability of QWalk, one of the popular QMC applications.

    Committee: Dr. Sadayappan P (Advisor); Dr. Srinivasan Parthasarathy (Committee Member) Subjects: Computer Science
  • 11. Chiu, David Auspice: Automatic Service Planning in Cloud/Grid Environments

    Doctor of Philosophy, The Ohio State University, 2010, Computer Science and Engineering

    Scientific advancements have ushered in staggering amounts of available data and processes which are now scattered across various locations in the Web, Grid, and more recently, the Cloud. These processes and data sets are often semantically loosely-coupled and must be composed together piecemeal to generate scientific workflows. Understanding how to design, manage, and execute such data-intensive workflows has become increasingly esoteric, confined to a few scientific experts in the field. Despite the development of scientific workflow management systems, which have simplified workflow planning to some extent, a means to reduce the complexity of user interaction without forfeiting some robustness has been elusive. This violates the essence of scientific progress, where information should be accessible to anyone. A high-level querying interface tantamount to common search engines that can not only return a relevant set of scientific workflows, but also facilitate their execution, may be highly beneficial to users. The development of such a system that can abstract the complex task of scientific workflow planning and execution from the user is reported herein. Our system, Auspice: AUtomatic Service Planning In Cloud/Grid Environments, consists of the following key contributions. Initially, a two-level metadata management framework is introduced. In the top-level, Auspice captures semantic dependencies among available, shared processes and data sets with an ontology. Our system furthermore indexes these shared resources for facilitating fast planning times. This metadata framework enables an automatic workflow composition algorithm, which exhaustively enumerates relevant scientific workflow plans given a few key parameters - a marked departure from requiring users to design and manage workflow plans. By applying models on processes, time-critical and accuracy-aware constraints can be realized in this planning algorithm. During the planning phase, Auspice projects thes (open full item for complete abstract)

    Committee: Gagan Agrawal PhD (Advisor); Hakan Ferhatosmanoglu PhD (Committee Member); Christopher Stewart PhD (Committee Member) Subjects: Computer Science
  • 12. Kang, Yixiu Implementation of Forward and Reverse Mode Automatic Differentiation for GNU Octave Applications

    Master of Science (MS), Ohio University, 2003, Electrical Engineering & Computer Science (Engineering and Technology)

    In this work, we present two C/C++ implementations of general purpose automatic differentiation (AD) for GNU Octave applications: FAD for forward mode AD and LogAD for reverse mode AD with bisection checkpointing. Both FAD and LogAD accept functions written in the GNU Octave language and work in the Octave environment via dynamically linked functions. FAD evaluates the product of the Jacobian of the input function and an arbitrary vector in time and space that are proportional to the time and space used by the original function. LogAD evaluates the product of an arbitrary vector and the Jacobian of the input function via a checkpointing approach first proposed by Griewank in 1992.

    Committee: David Juedes (Advisor) Subjects: Computer Science