Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 8)

Mini-Tools

 
 

Search Report

  • 1. Carraher, Lee Approximate Clustering Algorithms for High Dimensional Streaming and Distributed Data

    PhD, University of Cincinnati, 2018, Engineering and Applied Science: Computer Science and Engineering

    Clustering data has gained popularity in recent years due to an expanding opportunity to discover knowledge and collect insights from multiple widely available and diverse data sources. Data clustering offers an intuitive solution to a wide variety of unsupervised classification problems. Clustering solutions to problems arise often in areas in which no ground truth is known, or when the arrival frequency of a data source exceeds the labeling capabilities of a human expert. Due to the continuous, fast moving nature of many common data streams, such as those from IoT (Internet of Things) devices, social network interactions, e-commerce click-streams, scientific monitoring devices, and network traffic, noise robust and distributed clustering algorithms are necessary. Often, current clustering methods suffer from one or more drawbacks when applied to these demanding problems. For this reason, we propose a new method for data clustering that is noise resilient, and distributed with a predictable overall complexity. The principal claim of this research is that while many clustering algorithms rigorously optimize a loss function, their convergence often results in finding a local minima that is indistinguishable from a less computationally rigorous optimization on an approximation of the data. We propose that by removing the rigorous optimization requirement, we can achieve better scalability, and parallelism with comparable performance. In this work we design a clustering algorithm along these lines that uses dimensional reduction and hashing to reduce the problem size while still attaining comparable clustering performance to other clustering algorithms. Our proposed method is more robust to noise with a lower runtime requirement, and greater opportunity for shared and distributed memory parallelism. This work presents a set of methods for clustering high dimensional data for a variety of data source and processing environments. The proposed RPHash al (open full item for complete abstract)

    Committee: Philip Wilsey Ph.D. (Committee Chair); Fred Beyette Ph.D. (Committee Member); Raj Bhatnagar Ph.D. (Committee Member); Anca Ralescu Ph.D. (Committee Member); Dan Ralescu Ph.D. (Committee Member) Subjects: Computer Engineering
  • 2. Singh, Saurabh Characterizing applications by integrating and improving tools for data locality analysis and program performance

    Master of Science, The Ohio State University, 2017, Computer Science and Engineering

    Data locality is a critical factor which affects the execution time of applications today. With major advances being made in reducing the computation time of processors, data movement costs have increasingly become a bottleneck in running time and energy efficiency of current applications. They help in gaining useful insights into program's behaviour for a given execution. The work in this thesis extends an existing dynamic analysis framework which has been used to develop dynamic analysis tools e.g. a tool to identify vectorization potential of existing programs, a tool responsible for characterizing and assessing the inherent data locality properties of a given computation. This existing framework is based on construction and analysis of the dynamic dependence graph for a given execution. The framework is not well integrated with existing tools that analyze and report program performance. This makes the data locality analysis somewhat inaccessible for popular use. In this thesis, we try to bridge that gap by integrating the tools for the different analysis and reporting the result in a coherent way. We also report the improvements to the tools in consideration during this endeavor including enabling scalable analysis of large dynamic dependence graphs. Finally, we use the work to characterize certain well known benchmarks.

    Committee: Ponnuswamy Sadayappan (Advisor); Atanas Rountev (Committee Member) Subjects: Computer Engineering; Computer Science
  • 3. Lu, Qingda Data Layout Optimization Techniques for Modern and Emerging Architectures

    Doctor of Philosophy, The Ohio State University, 2008, Computer Science and Engineering

    The never-ending pursuit of higher performance is one fundamental driving force of computer science research. Although the semiconductor industry has fulfilled Moore's Law over the last forty years by doubling transistor density every two years, the effectiveness of hardware advances cannot be fully exploited due to the mismatch between the architectural environment and the user program. Program optimization is a key to bridge this gap.In this dissertation, instead of restructuring programs' control flow as in many previous efforts, we have applied several new data layout optimization techniques to answer many optimization challenges on modern and emerging architectures. In particular, the developed techniques and their unique contributions are as follows. We describe an approach where a class of computations is modeled in terms of constituent operations that are empirically measured, thereby allowing modeling of the overall execution time. The performance model with empirically determined cost components is used to perform data layout optimization in the context of the Tensor Contraction Engine, a compiler for a high-level domain-specific language for expressing computational models in quantum chemistry. To obtain a highly optimized index permutation library for dynamic data layout optimization, we develop an integrated optimization framework that addresses a number of issues including tiling for the memory hierarchy, effective handling of memory misalignment, utilizing memory subsystem characteristics, and the exploitation of the parallelism provided by the vector instruction sets in current processors. A judicious combination of analytical and empirical approaches is used to determine the most appropriate optimizations. With increasing numbers of cores, future CMPs (Chip Multi-Processors) are likely to have a tiled architecture with a portion of shared L2 cache on each tile and a bank-interleaved distribution of the address space. Although such an organization is (open full item for complete abstract)

    Committee: P. Sadayappan (Advisor); Xiaodong Zhang (Committee Member); J. Ramanujam (Committee Member); Atanas Rountev (Committee Member) Subjects: Computer Science
  • 4. Yap, Xiu Huan Multi-label classification on locally-linear data: Application to chemical toxicity prediction

    Doctor of Philosophy (PhD), Wright State University, 2021, Biomedical Sciences PhD

    Computational models may assist in identification and prioritization of large chemical libraries. Recent experimental and data curation efforts, such as from the Tox21 consortium, have contributed towards toxicological datasets of increasing numbers of chemicals and toxicity endpoints, creating a golden opportunity for the exploration of multi-label learning and deep learning approaches in this thesis. Multi-label classification (MLC) methods may improve model predictivity by accounting for label dependence. However, current measures of label dependence, such as correlation coefficient, are inappropriate for datasets with extreme class imbalance, often seen in toxicological datasets. In this thesis, we propose a novel label dependence measure that directly models the conditional probability of a label-pair and displays greater sensitivity than correlation coefficient for labels with low prior probabilities. MLC models using data-driven label partitioning based on this measure was generally non-inferior to MLC models using random label partitioning. Marginal improvements in model predictivity have prompted toxicology modelers to shy away from deep learning and resort to ‘simpler' models, such as k-nearest neighbors, for its greater explainability. Given the prevalence of local, linear quantitative structure-activity relationship (QSAR) models in computational toxicology, we hypothesize that toxicological datasets have locally-linear data structures, resulting in heterogeneous classification spaces that challenges the basic assumptions of most machine learning algorithms. We propose the locality-sensitive deep learner, a modification of deep neural networks which uses attention mechanism to learn datapoint locality. On carefully-constructed synthetic data with extremely unbalanced classes (10% active) and (60%) cluster-specific noise, the locality-sensitive deep learner with learned feature weights retained high test performance (AUC>0.9), while the feed-forward n (open full item for complete abstract)

    Committee: Michael L. Raymer Ph.D. (Advisor); David R. Cool Ph.D. (Committee Member); Lynn K. Hartzler Ph.D. (Committee Member); Travis E. Doom Ph.D. (Committee Member); Courtney E.W. Sulentic Ph.D. (Committee Member) Subjects: Computer Science; Toxicology
  • 5. Hong, Changwan Code Optimization on GPUs

    Doctor of Philosophy, The Ohio State University, 2019, Computer Science and Engineering

    Graphic Processing Units (GPUs) have become popular in the last decade due to their high memory bandwidth and powerful computing capacity. Nevertheless, achieving high-performance on GPUs is not trivial. It generally requires significant programming expertise and understanding of details of low-level execution mechanisms in GPUs. This dissertation introduces approaches for optimizing regular and irregular applications. To optimize regular applications, it introduces a novel approach to GPU kernel optimization by identifying and alleviating bottleneck resources. This approach, however, is not effective in irregular applications because of data-dependent branches and memory accesses. Hence, tailored approaches are developed for two popular domains of irregular applications: graph algorithms and sparse matrix primitives. Performance modeling for GPUs is carried out by abstract kernel emulation along with latency/gap modeling of resources. Sensitivity analysis with respect to resource latency/gap parameters is used to predict the bottleneck resource for a given kernel's execution. The utility of the bottleneck analysis is demonstrated in two contexts: i) Enhancing the OpenTuner auto-tuner with the new bottleneck-driven optimization strategy. Effectiveness is demonstrated by experimental results on all kernels from the Rodinia suite and GPU tensor contraction kernels from the NWChem computational chemistry suite. ii) Manual code optimization. Two case studies illustrate the use of a bottleneck analysis to iteratively improve the performance of code from state-of-the-art DSL code generators. However, the above approach is ineffective for irregular applications such as graph algorithms and sparse linear systems. Graph algorithms are used in various applications, and high-level GPU graph processing frameworks are an attractive alternative for achieving both high productivity and high-performance. This dissertation develops an approach to graph processing on GPUs (open full item for complete abstract)

    Committee: Ponnuswamy Sadayappan (Advisor); Rountev Atanas (Committee Member); Teodorescu Radu (Committee Member) Subjects: Computer Science
  • 6. Nafziger, Jonathan A Novel Cache Migration Scheme in Network-on-Chip Devices

    MS, University of Cincinnati, 2010, Engineering and Applied Science: Computer Engineering

    Future Network-on-Chip (NoC) designs no longer map single cores to each cache slice but rather multiple cores in layouts known as hybrid architectures. Additional proposals have suggested creating reconfigurable hybrid architectures where the OS can revise core-to-cache mappings as required. However, these designs will still be measured by their ability to reduce the average L2 cache delay. Denser core placements with varying core mappings require cache policies with intelligent data placement schemes otherwise there will be no gain to overall system performance as a result of the networked architecture. Solutions such as OS-directed page placement can reduce some of this delay by placing pages in caches local to the initial requestor. However, due to the page-level allocation granularity compared to line-level data accesses, this policy can still result in shared data existing in remote locations during highly parallelized applications. The most effective network delay reduction alternative is line-level data migration. Data migration policies are designed to take advantage of data temporal locality by assuming data recently used by a processor will be used again in the future. Several variations of migration policies have been proposed to address this demand. However, the physical costs, high computation demands and poor scalability of these methods have reduced their effectiveness in future layouts with hundreds of cores. Additionally, many proposals fail to consider migrating data to a centralized location with even latencies for multiple active cores instead they reduce latency for a single core at the expense of all others. This best average placement is also known as the nearest-neighbor search or the “Two-Dimensional Post Office Problem”. The proposed Directional Migration solution attempts to solve these problems by providing an autonomous, line-level migration that is responsive to multiple cores with varying access patterns. This design maintains two (open full item for complete abstract)

    Committee: Ranganadha Vemuri PhD (Committee Chair); Carla Purdy, C PhD (Committee Member); Wen Ben Jone PhD (Committee Member) Subjects: Electrical Engineering
  • 7. Khanna, Gaurav A Data-Locality Aware Mapping and Scheduling Framework for Data-Intensive Computing

    Doctor of Philosophy, The Ohio State University, 2008, Computer Science and Engineering

    Science is increasingly becoming more and more data-driven. With technologicaladvancements such as advanced sensing technologies that can rapidly capture data at high resolutions and Grid technologies that enable increasingly realistic simulation of complex numerical models, scientific applications have become very data-intensive and involve storing and accessing large amounts of data. The LHC experiment at CERN is an example of a high energy physics initiative where the amount of data stored is in petabytes. The end goal in collecting petabytes of simulation data is to gain a better understanding of the problem under study. This essentially involves collaborative analysis of data by scientists across the world which conforms to a distributed data-intensive computing paradigm where a set of compute, storage and network resources are used in a collective fashion to advance science. Effective scheduling and resource management for such data-intensive applications on distributed resources is critical in order to meet their performance requirements. Efficient scheduling in the aforementioned scenario encompasses two key inter-related problems. The first one is the data staging problem which involves the staging of data from the simulation/experimental sites to the computational sites where the data analysis needs to be performed. The second one is the job mapping problem which involves the mapping of data analysis jobs to compute resources in such a manner so as to maximize the locality of data usage. Traditional batch job schedulers are designed for compute-intensive jobs running at supercomputer centers. They take into account CPU related metrics (e.g., user estimated job run times) and system state (e.g., queue wait times) to make scheduling decisions, but they do not take into account data related metrics. Therefore, there is a need for designing scheduling mechanisms for data-analysis jobs that take into account not only the computation time of the jobs, but also t (open full item for complete abstract)

    Committee: Ponnuswamy Sadayappan PhD (Advisor); Umit Catalyurek PhD (Committee Member); Tahsin Kurc PhD (Committee Member); Joel Saltz PhD (Committee Member); Srinivasan Parthasarathy PhD (Committee Member) Subjects: Computer Science
  • 8. Krishnamoorthy, Sriram Optimizing locality and parallelism through program reorganization

    Doctor of Philosophy, The Ohio State University, 2008, Computer and Information Science

    Development of scalable application codes requires an understanding and exploitation of the locality and parallelism in the computation. This is typically achieved through optimizations by the programmer to match the application characteristics to the architectural features exposed by the parallel programming model. Partitioned address space programming models such as MPI foist a process-centric view of the parallel system, increasing the complexity of parallel programming. Typical global address space models provide a shared memory view that greatly simplifies programming. But the simplified models abstract away the locality information, precluding optimized implementations. In this work, we present techniques to reorganize program execution to optimize locality and parallelism, with little effort from the programmer. For regular loop-based programs operating on dense multi-dimensional arrays, we propose an automatic parallelization technique that attempts to determine a parallel schedule in which all processes can start execution in parallel. When the concurrent tiled iteration space inhibits such execution, we present techniques to re-enable it. This is an alternative to incurring the pipelined startup overhead in schedules generated by prevalent approaches. For less structured programs, we propose a programming model that exposes multiple levels abstraction to the programmer. These abstractions enable quick prototyping coupled with incremental optimizations. The data abstraction provides a global view of distributed data organized as blocks. A block is a subset of data stored contiguously in a single process' address space. The computation is specified as a collection of tasks operating on the data blocks, with parallelism and dependence being specified between them. When the blocking of the data does not match the required access pattern in the computation, the data needs to be reblocked to improve spatial locality. We develop efficient data layout transformati (open full item for complete abstract)

    Committee: P Sadayappan (Advisor) Subjects: Computer Science