Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 22)

Mini-Tools

 
 

Search Report

  • 1. Gula, Govardhan Accelerating Bootstrap Resampling using Two-Step Poisson-Based Approximation Schemes

    Master of Computing and Information Systems, Youngstown State University, 0, Department of Computer Science and Information Systems

    Bootstrap sampling serves as a cornerstone in statistical analysis, providing a robust method to evaluate the precision of sample-based estimators. As the landscape of data processing expands to accommodate big data, approximate query processing (AQP) emerges as a promising avenue, albeit accompanied by challenges inaccurate assessment. By leveraging bootstrap sampling, the errors of sample-based estimators in AQP can be effectively evaluated. However, the implementation of bootstrap sampling encounters obstacles, particularly in the computation-intensive resampling procedure. This thesis embarks on an exploration of various resampling methods, scrutinizing five distinct approaches: On Demand Materialization (ODM) Method, Conditional Binomial Method (CBM), Naive Method, Two-Step Poisson Random (TSPR), and Two-Step Poisson Adaptive (TSPA). Through rigorous evaluation and comparison of the execution time for each method, this thesis elucidates their relative efficiencies and contributions to AQP analyses within the realm of big data processing. Furthermore, this research contributes to the broader understanding of resampling techniques in statistical analysis, offering insights into their computational complexities and implications for big data analytics. By addressing the challenges posed by AQP in the context of bootstrap sampling, this thesis seeks to advance methodologies for accurate assessment in the era of big data processing.

    Committee: Feng Yu PhD (Advisor); Lucy Kerns PhD (Committee Member); Alina Lazar PhD (Committee Member) Subjects: Computer Science; Engineering; Information Systems; Information Technology; Mathematics
  • 2. Robin, DEBOBROTO DAS LEVERAGING PISA SWITCHES FOR TRAFFIC-AWARE IN NETWORK LOAD BALANCING IN DATA CENTER NETWORKS

    PHD, Kent State University, 2023, College of Arts and Sciences / Department of Computer Science

    Over the last two decades, the exponential growth in data consumption, applications to facilitate them, and the evolution of application architectures have given rise to high-entropy traffic patterns through data center networks. Load balancing plays a pivotal role in managing this complex traffic pattern by link layer and application layer load balancing. The conventional load balancing meth- ods, while effective to some extent, have faced challenges in responding to the dynamic nature of the traffic pattern. To tackle this challenge, the load balancing concept has transformed, with the emergence of traffic awareness as a critical factor in its implementation. It requires the dynamic weight assignment/update for the paths while load balancing to adapt to high traffic variation. Re- cently emerging ”Protocol Independent Switch Architecture” (PISA) based programmable switches have unleashed the scope of implementing custom load balancing algorithms inside the network fab- ric. These in-network load balancers have the potential to achieve scalable, high throughput, and highly traffic-aware load balancing at the data center scale. However, PISA switches are designed with a limited per-packet computational budget to achieve extremely high throughput in the 6-100 Tbps range. Data center switches need to implement a large set of complex protocols and features. Delegating load-balancing logic to these switches consumes a costly computational budget, making implementing other primary features infeasible. As a result, existing PISA switch-based in-network load balancers are primarily designed for either link or application layer load balancing to meet this budget constraint. Moreover, they can not offer scalability for the same reason. This dissertation takes the next step to fill this critical gap and designs a weighted cost mechanism-based configurable load balancer (CLB)using PISA switches. CLB can offer scalable load balancing for both the link and ap (open full item for complete abstract)

    Committee: Javed I. Khan (Advisor) Subjects: Communication; Computer Science
  • 3. Devasani, Lakshmi Prasanna Katyayani Solidity Compiler Version Identification on Smart Contract Bytecode

    Master of Science in Cyber Security (M.S.C.S.), Wright State University, 2023, Computer Science

    Identifying the version of the Solidity compiler used to create an Ethereum contract is a challenging task, especially when the contract bytecode is obfuscated and lacks explicit metadata. Ethereum bytecode is highly complex, as it is generated by the Solidity compiler, which translates high-level programming constructs into low-level, stack-based code. Additionally, the Solidity compiler undergoes frequent updates and modifications, resulting in continuous evolution of bytecode patterns. To address this challenge, we propose using deep learning models to analyze Ethereum bytecodes and infer the compiler version that produced them. A large number of Ethereum contracts and the corresponding compiler versions is used to train these models. The dataset includes contracts compiled with various versions of the Solidity compiler. We preprocess the dataset to extract opcode sequences from the bytecode, which serve as inputs for the deep learning models. We use the advanced sequence learning methods such as bidirectional long short-term memory (Bi-LSTM), convolutional neural network (CNN), CNN+Bi-LSTM, Transformer, and Sentence BERT (SBERT) to capture the semantics of the opcode sequences. We analyze each model's performance using metrics such as accuracy, precision, recall, and F1-score. Our results demonstrate that our developed models excel at identifying the Solidity compiler version used in smart contracts with high accuracy. We also compare our methods with non-sequence learning models, showing that our models outperform them in most cases. This highlights the advantages of our proposed approaches for identifying Solidity compiler versions from Ethereum bytecodes.

    Committee: Lingwei Chen Ph.D. (Advisor); Meilin Liu Ph.D. (Committee Member); Junjie Zhang Ph.D. (Committee Member) Subjects: Computer Engineering; Computer Science; Information Technology
  • 4. Bagnall, Alexander Formally Verified Samplers From Discrete Probabilistic Programs

    Doctor of Philosophy (PhD), Ohio University, 2023, Electrical Engineering & Computer Science (Engineering and Technology)

    This dissertation presents Zar: a formally verified compilation pipeline from discrete probabilistic programs in the conditional probabilistic guarded command language (cpGCL) to proved-correct executable samplers in the random bit model. Zar exploits the key idea that discrete probability distributions can be reduced to unbiased coin-flipping schemes. The compiler pipeline first translates cpGCL programs into choice-fix trees, an intermediate representation suitable for reduction of biased probabilistic choices. Choice-fix trees are then translated to coinductive interaction trees for execution within the random bit model. The correctness of the composed translations establishes the sampling equidistribution theorem: compiled samplers are correct with respect to the conditional weakest pre-expectation (cwp) semantics of their cpGCL source programs. Zar is implemented and fully verified in the Coq proof assistant. We extract verified samplers to OCaml and Python and empirically validate them on a number of illustrative examples. We additionally present AlgCo (Algebraic Coinductives), a practical framework for inductive reasoning over coinductive types such as conats, streams, and infinitary trees with finite branching factor, developed during the course of this work to enable convenient formal reasoning for coinductive samplers generated by Zar. The key idea is to exploit the notion of algebraic CPO from domain theory to define continuous operations over coinductive types via primitive recursion on "dense" collections of their elements, enabling a convenient strategy for reasoning about algebraic coinductives by straightforward proofs by induction. We implement the AlgCo library in Coq and demonstrate its utility by verifying a stream variant of the sieve of Eratosthenes, a regular expression library based on coinductive tries, and weakest pre-expectation semantics for potentially nonterminating sampling processes over discrete probability distributions in the r (open full item for complete abstract)

    Committee: David Juedes (Advisor); James Stewart (Committee Member); Vladimir Uspenskiy (Committee Member); Jundong Liu (Committee Member); Anindya Banerjee (Committee Member); David Chelberg (Committee Member) Subjects: Computer Science
  • 5. Bao, Wenlei Compiler Techniques for Transformation Verification, Energy Efficiency and Cache Modeling

    Doctor of Philosophy, The Ohio State University, 2018, Computer Science and Engineering

    Performance has been the focus of computer systems for decades, from past Moore law to current parallel computers. Compiler optimizations are used to improve performance by generating code to utilize hardware (e.g. cache)component efficiently. However, modern systems such as large scale system require not only performance but also resilience and energy efficiency. Increasing concern of system resilience and energy efficiency has been shown in both industry and academia. Errors within applications, especially those escape from detection and resulting in silent data corruption, are extremely problematic. Thus, in order to improve the resilience of applications, error detection and vulnerability characterization techniques are an important step towards fault tolerant applications. Compiler transformations, which restructure programs to improve performance by leveraging data locality and parallelism, are often complex and possibly involve bugs that leads to errors in transformed programs. Thus it is essential to guarantee the correctness, however, current approaches suffers from various problems such as transformations supported or space complexity etc. This dissertation presents a novel approach that performs dynamic verification by inserting lightweight checker codes to detect errors of transformations. The errors are exposed by the execution of checker-inserted transformed program if exist. Energy efficiency is of increasingly importance in scenarios ranging from battery-operated devices to data centers striving for lower energy costs. Dynamic voltage and frequency scaling (DVFS) adapts CPU power consumption by modifying processor frequency to improve energy efficiency. Typical DVFS approaches involve default strategies such as reacting to the CPU runtime load to adapt frequency, which have inherent limitations because of processor-specific and application-specific effects. This dissertation developed a novel compile-time characterization to select frequency (open full item for complete abstract)

    Committee: Ponnuswamy Sadayappan (Advisor); Gagan Agrawal (Committee Member); Radu Teodorescu (Committee Member); Louis-Noel Pouchet (Committee Member); Sriram Krishnamoorthy (Committee Member) Subjects: Computer Science
  • 6. Liu, Lifeng An Optimization Compiler Framework Based on Polyhedron Model for GPGPUs

    Doctor of Philosophy (PhD), Wright State University, 2017, Computer Science and Engineering PhD

    General purpose GPU (GPGPU) is an effective many-core architecture that can yield high throughput for many scientific applications with thread-level parallelism. However, several challenges still limit further performance improvements and make GPU programming challenging for programmers who lack the knowledge of GPU hardware architecture. In this dissertation, we describe an Optimization Compiler Framework Based on Polyhedron Model for GPGPUs to bridge the speed gap between the GPU cores and the off-chip memory and improve the overall performance of the GPU systems. The optimization compiler framework includes a detailed data reuse analyzer based on the extended polyhedron model for GPU kernels, a compiler-assisted programmable warp scheduler, a compiler-assisted cooperative thread array (CTA) mapping scheme, a compiler-assisted software-managed cache optimization framework, and a compiler-assisted synchronization optimization framework. The extended polyhedron model is used to detect intra-warp data dependencies, cross-warp data dependencies, and to do data reuse analysis. The compiler-assisted programmable warp scheduler for GPGPUs takes advantage of the inter-warp data locality and intra-warp data locality simultaneously. The compiler-assisted CTA mapping scheme is designed to further improve the performance of the programmable warp scheduler by taking inter thread block data reuses into consideration. The compiler-assisted software-managed cache optimization framework is designed to make a better use of the shared memory of the GPU systems and bridge the speed gap between the GPU cores and global off-chip memory. The synchronization optimization framework is developed to automatically insert synchronization statements into GPU kernels at compile time, while simultaneously minimizing the number of inserted synchronization statements. Experiments are designed and conducted to validate our optimization compiler framework. Experimental results show that our op (open full item for complete abstract)

    Committee: Meilin Liu Ph.D. (Advisor); Jack Jean Ph.D. (Committee Member); Travis Doom Ph.D. (Committee Member); Jun Wang Ph.D. (Committee Member) Subjects: Computer Engineering; Computer Science
  • 7. Sengupta, Aritra Efficient Compiler and Runtime Support for Serializability and Strong Semantics on Commodity Hardware

    Doctor of Philosophy, The Ohio State University, 2017, Computer Science and Engineering

    Parallel software systems often have non-deterministic behavior and suffer from reliability issues. A major deterrent to the use of sound and precise bug-detection techniques is the impractical run-time overhead and poor scalability of such analyses. Alternatively, strong memory models or program execution models could achieve software reliability. Unfortunately, existing analyses and systems that provide stronger guarantees in ill-synchronized programs are either expensive or they compromise on reliability for performance. Building efficient runtime support for enforcing strong program semantics remains a challenging problem because of the overhead incurred to implement the complex mechanisms in the underlying system. The overhead could be because of restriction on reordering of operations in the compiler or the hardware, instrumentation cost in enforcing stronger program semantics through a dynamic analysis, the cost of concurrency control or synchronization mechanisms typically required in such an analysis, and the cost of leveraging specific hardware constructs. Researchers have proposed strong memory models based on serializability of regions of code—where regions of code across threads execute in isolation with each other. The run-time cost incurred to enforce strong memory models based on serializability stems from the unbounded number of operations that are monitored by an analysis. This thesis solves this critical problem of providing region serializability at reasonable overheads and yet demonstrating its efficacy in eliminating erroneous behavior from ill-synchronized programs. Providing serializability of code regions using compiler techniques and dynamic analysis while leveraging both generic and specialized commodity hardware, at low overheads, across concurrent programs having diverse characteristics—can make enforcement of strong semantics for real-world programs practical and scalable. This work presents a memory model based on bounded regions, call (open full item for complete abstract)

    Committee: Michael Bond (Advisor) Subjects: Computer Engineering
  • 8. Nguyen, Hoang Steve - A Programming Language for Packet Processing

    Master of Science, University of Akron, 2016, Computer Science

    Software-defined networking (SDN) aims to make network switches programmable to enable a class of intelligent networking applications that can automate network flow direction in ways that conventional switches cannot. We present Steve, a protocol-independent, domain-specific language (DSL) for writing these networking applications on SDN devices. Steve provides high-level language features for expressing protocol structure, decoding rules, forwarding decisions, packet manipulation, and event handling for reactive non-distributed control planes. These features define a packet processing pipeline -- the algorithm used to make forwarding decisions. Steve solves two issues in SDN language development: safe packet access and safe pipeline composition. Vulnerabilities in an application running a network switch can be disastrous, therefore the Steve compiler is designed to catch potential errors. Steve uses a type and constraints system which enforces these safety guarantees. To verify our work, we produced a Steve language compiler which implements these safety guarantees. We also present four compilable Steve applications: a MAC and IPv4 learning switch, a stateless firewall, and a wire. These applications are tested with a runtime environment which provides Steve access to switch resources.

    Committee: Andrew Sutton Dr. (Advisor); Kathy Liszka Dr. (Committee Member); Michael Collard Dr. (Committee Member) Subjects: Computer Science
  • 9. Ravishankar, Mahesh Automatic Parallelization of Loops with Data Dependent Control Flow and Array Access Patterns

    Doctor of Philosophy, The Ohio State University, 2014, Computer Science and Engineering

    With the era of increasing clock speeds coming to an end, parallel computing architectures have now become main-stream. Due to the wide range of architectures available today that can be used to exploit parallelism, ranging from multicore CPUs, to GPGPUs, to distributed memory machines; adapting applications for efficient execution on all these architectures poses a significant challenge. Scientific computing applications exhibit significant coarse-grained parallelism. Domain experts have exploited this to target distributed memory machines through the use of Message Passing Interface (MPI) libraries. Many such applications have been shown to scale to hundreds of thousands of processors. While powerful, programming in MPI is tedious and error prone, with significant portion of the parallelized application dedicated to managing communication and synchronization between the processes. Developing compiler techniques that can automatically generate parallel distributed memory versions of such codes is challenging due to the presence of data-dependent control flow and data access patterns. In this thesis we develop compiler algorithms for automatic parallelization of a class of computations common to many scientific applications. Under the inspector/executor paradigm, the generated code partitions and executes the computation in load-balanced manner, while reducing the communication costs. This approach is further enhanced by developing a framework capable of expressing both affine and non-affine parts of the code. This enables the use of polyhedral compilation tools to analyze parts of the computation which can be completely characterized statically. The effectiveness of the developed approach is demonstrated on several benchmarks and real-world applications. Image processing applications on the other hand exhibit significant fine-grained parallelism and are well suited for architectures with Single-Instruction Multiple Data (SIMD) proc (open full item for complete abstract)

    Committee: P Sadayappan Prof. (Advisor); Atanas Rountev Prof. (Committee Member); Gagan Agrawal Prof. (Committee Member) Subjects: Computer Engineering
  • 10. BHALGAT, ASHISH INSTRUCTION SCHEDULING TO HIDE LOAN/STORE LATENCY IN IRREGULAR ARCHITECTURE EMBEDDED PROCESSORS

    MS, University of Cincinnati, 2001, Engineering : Computer Engineering

    Modern computers are taking increasing advantage of the instruction-level parallelism (ILP) available in application programs. Advances in the architectural design of the Application Specific Instruction Core (ASIC) embedded processors often result in complex and irregular processor architectures. Most of the modern embedded processors use both pipelining & multiple instruction issue techniques in order to execute the instructions in parallel and run the application programs much faster. Performance of such Very Long Instruction Word (VLIW) architectures is mainly dependent on the capability of the compiler to detect and exploit the ILP. To take the advantage of the inherently available parallelism in the code, instruction are reordered so that the length of code schedule can be reduced. Thus, code motion of instructions may minimize the overall cycle count resulting in better code execution efficiency. However, code reordering is often constrained by the dependencies among the instructions due to the control and data flow inherent in the code. Further, compile time instruction scheduling for such irregular architectures is a challenging problem due to the architectural constraints imposed on the restructuring of instructions. In this thesis, we present a framework to restructure the instructions to exploit the ILP through the parallel instructions available in the Instruction Set Architecture (ISA) of our EPIC VLIW Digital Signal Processor. We focus on the memory access instructions, loads and stores, and optimize the schedule to hide the load/store instruction latency.

    Committee: Santosh Pande (Advisor) Subjects:
  • 11. Thammanur, Sathyanarayan A Just in Time Register Allocation and Code Optimization Framework for Embedded Systems

    MS, University of Cincinnati, 2001, Engineering : Computer Engineering

    Mobile programs are being used extensively to be run on embedded systems. These programs typically needs to be downloaded as code slices onto the embedded device. Code slices on embedded system would be compiled “just-in-time” and run. Code size is an important issue for such embedded systems. Register allocators in current literature have always targeted speed of execution as their key issue and hence are not suited for embedded systems. We describe a “just-in-time”, usage density based register allocator geared towards systems with limited general purpose register set wherein speed, code size and memory requirements are equally of concern. The main attraction of the allocator is that it does not make use of the traditional live range and interval analysis nor performs advanced optimizations based on range splitting or spilling but results in very good code quality. We circumvent the need for traditional analysis by using a measure of usage density of a variable. The usage density of a variable at a program point represents both the frequency and the density of the uses. We contend that using this measure we can capture both range and frequency information which is essentially used by the good allocators based on splitting and spilling. We describe a two-pass framework based on this measure which has a linear complexity in terms of the program size. We perform comparisons with the static allocators based on graph coloring and the ones targeted towards dynamic compilation systems like linear scan of live ranges. We have implemented our allocator inside the tcc compiler. We find that the usage density allocator generates a better code quality than the other allocators for smaller set of registers. Our results show an improvement over Briggs-style allocator in reducing code size of up to 12% for a register set of size 16 in some cases. The results are interesting with decreasing register set size in terms of both code size as well as memory requirements. The algorithm (open full item for complete abstract)

    Committee: Dr. Santosh Pande (Advisor) Subjects:
  • 12. MORE, JOHN EXTENSIBILITY OF AN OBJECT-ORIENTED COMPILIER INTERMEDIATE WITH A FOCUS ON CLONING

    MS, University of Cincinnati, 2005, Engineering : Civil Engineering

    This thesis examines a re-design of the AIRE object-oriented compiler intermediate in order to simplify the task of building and adding new back-end analysis tools to the system. In particular, this thesis work proposes a revision to the AIRE intermediate form (which is used by several VHDL front-end compilers) that will simplify the design and use of back-end analysis tools. The revision includes three main parts, namely: (i) a restructuring of the base header files to setup an interface definition for the basic AIRE nodes; (ii) a factory method of AIRE node instantiation; and (iii) a cloning mechanism that enables cloning of the AIRE tree with different inheritance hierarchies in the AIRE nodes. In pursuit of this re-design effort, several different possible solutions were explored and evaluated. In particular, we studied three different approaches and developed small working prototypes of one before embarking on a full scale deployment of the cloning method in the SAVANT implementation of AIRE. The cloning method was chosen because it is best able to meet the goals of this project. Our goals included simplicity, ease of use, and flexibility. After implementing the abstract base classes and cloning mechanism into AIRE, the C++ code generator backend of SAVANT was moved to a “cloned” extension to AIRE and its performance (both compliance and runtime) evaluated against the original implementation. Preliminary performance analysis shows that the system continues to provide the same level of functionality as the previous approach. However, it is slower during execution, and consumes greater memory resources. These problems are offset by increased productivity gains during compilation, and increased flexibility at runtime.

    Committee: Dr. Philip Wilsey (Advisor) Subjects:
  • 13. Narayan, Mohanish PolyOpt/Fortran: A Polyhedral Optimizer for Fortran Programs

    Master of Science, The Ohio State University, 2012, Computer Science and Engineering

    Fortran has been one of the most prominent languages in the domain of scientific computing. Many of the problems it is used to solve are CPU intensive, involving complex memory access patterns, which are often difficult to optimize for performance both by contemporary compilers and programmers alike. A lot of code-base already exists, which was written keeping in mind the CPU architectures available during earlier times. These programs are unable to harness high performance on the multi-core parallel architectures which exist today. Hence a mechanism to optimize such programs to take advantage of current technology is highly desirable. The polyhedral model is a powerful mathematical framework for abstracting memory access patterns and statement dependencies to create transformations in order to take advantage of the memory hierarchy and parallelism, which is critical for high performance on current generation processors. In this work we discuss the anatomy and working of PolyOpt/Fortran, a source-to-source optimizing compiler based on the polyhedral model. We also analyze the impact on performance, by applying such transformations on various computational kernels.

    Committee: Dr. Atanas Rountev (Advisor); Dr. Ponnuswamy Sadayappan (Committee Member) Subjects: Computer Science
  • 14. Xu, Guoqing Analyzing Large-Scale Object-Oriented Software to Find and Remove Runtime Bloat

    Doctor of Philosophy, The Ohio State University, 2011, Computer Science and Engineering

    Over the past decade, the pervasive use of object-oriented languages and the increasing complexity of problems solved by computer software have led to the proliferation of large-scale framework-intensive applications. These applications are typically built by combining standard packages (e.g., J2EE application server frameworks), third-party layers for domain-specific functionality, and in-house solutions. While employing libraries and frameworks eases the development effort, it can be expensive to invoke the general APIs provided by these libraries, especially when they are used for simple tasks. As a result, many applications suffer from excessive memory footprint caused by chronic runtime bloat that significantly impacts scalability and performance. In addition, programmers are taught to pay more attention to abstractions and models, and to leave the performance optimization to the runtime system. While a few redundant objects, calls, and field copies may seem insignificant, the problem quickly gets exacerbated due to nesting and layering. At some point, these inefficiencies accumulate, and compilers (e.g., the JIT in a JVM) can no longer eliminate them, since layers of abstractions grow to be deep, dynamic, and well beyond the capabilities of compiler analysis. In this dissertation, the term bloat is used to refer to the general phenomenon of using excessive work and memory to achieve seemingly simple tasks. Bloat exists commonly in large-scale object-oriented programs and it is a major challenge that stands in the way of bridging the productivity-performance gap between managed and unmanaged languages. The overarching goal of our work is to find large bloat-related optimization opportunities, with a small amount of developer time. As a fundamental methodology to achieve this goal, we advocate tool-assisted manual optimization, in order to combine developer insight with the automated tool support. In particular, we have designed, implemented, and evaluated code (open full item for complete abstract)

    Committee: Atanas Rountev (Advisor); Feng Qin (Committee Member); Michael Bond (Committee Member) Subjects: Computer Science
  • 15. Gururaghavendran, Ashwin Applying Polyhedral Transformation to Fortran Programs

    Master of Science, The Ohio State University, 2011, Computer Science and Engineering

    A large number of computationally intensive scientific programs still exist as Fortran programs. High performance compilers aim at reducing the execution time of programs by source-to-source translation involving appropriate transformations. The polyhedral model for compiler optimization is a powerful mathematical framework to abstract and represent the computation in the loops and the data dependencies. There are powerful transformation and optimization techniques that can be applied using this model. This research aims at applying such polyhedral transformations to Fortran programs and characterizing them. We present the design and implementation of identifying the Static Control Parts (SCoPs) in Fortran programs and optimizing them using Polyopt, a fully automatic source-to-source translator. We have instrumented this compiler framework with the help of Polybench benchmarks and characterized the effectiveness for various transformations in the polyhedral model applied to Fortran programs.

    Committee: Dr. Ponnuswamy Sadayappan (Advisor); Dr. Atanas Rountev (Committee Member) Subjects: Computer Science
  • 16. Chinnaswamy, Karthiyayini Compile Time Extraction And Instrumentation of Affine Program Kernels

    Master of Science, The Ohio State University, 2010, Computer Science and Engineering

    A common observation in large scientific computation programs is that 90% of the execution time is spent in 10% of the actual code which may consist of loops with different nesting levels. Improving the performance of such loop codes or ”compute kernels” is of significant interest to the compiler community. Modern high-performance compilers aim at improving the execution time of programs by automatically translating/transforming the original code into an equivalent efficient sequential or parallel code. Exploiting parallelism in a parallel processing system and exploiting data locality in a single processor system are the major challenges for a high-performance compiler. These challenges can be effectively addressed by a compiler by developing optimizations using a mathematical model called Polyhedral Model. The polyhedral model provides a powerful abstraction to reason about transformations on collections of loop nests that constitute the compute kernels. This research aims at identifying these compute kernels (called Static Control Parts or SCoPs) in scientific programs which can be represented and optimized using the polyhedral model. The SCoP extraction algorithm is implemented and instrumented on SPEC and other benchmarks using a compiler infrastructure called ROSE developed at The Lawrence Livermore National Laboratories.

    Committee: Dr. P. Sadayappan (Advisor); Nasko Atanas Rountev (Committee Member) Subjects: Computer Science
  • 17. Sreenivasa Murthy, Giridhar Optimal Loop Unrolling for GPGPU Programs

    Master of Science, The Ohio State University, 2009, Computer Science and Engineering

    Graphics Processing Units (GPUs) are massively parallel, many-coreprocessors with tremendous computational power and very high memory bandwidth. GPUs are primarily designed for accelerating 3D graphics applications on modern computer systems and are therefore, specialized for highly data parallel, compute intensive problems, unlike general-purpose CPUs. In recent times, there has been significant interest in finding ways to accelerate general purpose (non-graphics), data parallel computations using the high processing power of GPUs. General-purpose Programming on GPUs (GPGPU) was earlier considered difficult because the only available techniques to program the GPUs were graphics-specific programming models such as OpenGL and DirectX. However, with the advent of GPGPU programming models such as NVIDIA's CUDA and the new standard OpenCL, GPGPU has become mainstream. Optimizations performed by the compiler play a very important role in improving the performance of computer programs. While compiler optimizations for CPUs have been researched for many decades now, the arrival of GPGPU, and it's differences in architecture and programming model, has brought along with it many new opportunities for compiler optimizations. One such classical optimization is 'Loop Unrolling'. Loop unrolling has proven to be a relatively inexpensive and beneficial optimization for CPU programs. However, current GPGPU compilers perform little to no loop unrolling. In this thesis, we attempt to understand the impact of loop unrolling on GPGPU programs and using this understanding, we develop a semi-automatic, compile-time approach for identifying optimal unroll factors for suitable loops in GPGPU programs. In addition, we also propose techniques for reducing the number of unroll factors evaluated, based on the characteristics of the program being compiled and the device being compiled to. We use these techniques to evaluate the effect of loop unrolling on a range of GPGPU programs and show that (open full item for complete abstract)

    Committee: Ponnuswamy Sadayappan PhD (Advisor); Atanas Rountev PhD (Committee Member) Subjects: Computer Science
  • 18. Bondhugula, Uday Kumar Effective Automatic Parallelization and Locality Optimization Using The Polyhedral Model

    Doctor of Philosophy, The Ohio State University, 2008, Computer Science and Engineering

    Multicore processors have now become mainstream. The difficulty ofprogramming these architectures to effectively tap the potential of multiple processing units is well-known. Among several ways of addressing this issue, one of the very promising and simultaneously hard approaches is Automatic Parallelization. This approach does not require any effort on part of the programmer in the process of parallelizing a program. The Polyhedral model for compiler optimization is a powerful mathematical framework based on parametric linear algebra and integer linear programming. It provides an abstraction to represent nested loop computation and its data dependences using integer points in polyhedra. Complex execution-reordering, that can improve performance by parallelization as well as locality enhancement, is captured by affine transformations in the polyhedral model. With several recent advances, the polyhedral model has reached a level of maturity in various aspects – in particular, as a powerful intermediate representation for performing transformations, and code generation after transformations. However, an approach to automatically find good transformations for communication-optimized coarse-grained parallelization together with locality optimization has been a key missing link. This dissertation presents a new automatic transformation framework that solves the above problem. Our approach works by finding good affine transformations through a powerful and practical linear cost function that enables efficient tiling and fusion of sequences of arbitrarily nested loops. This in turn allows simultaneous optimization for coarse-grained parallelism and locality. Synchronization-free parallelism and pipelined parallelism at various levels can be extracted. The framework can be targeted to different parallel architectures, like general-purpose multicores, the Cell processor, GPUs, or embedded multiprocessor SoCs. The proposed framework has been impl (open full item for complete abstract)

    Committee: Prof. P Sadayappan (Advisor); Dr. Atanas Rountev (Committee Member); Prof. Gagan Agrawal (Committee Member); Prof. J Ramanujam (Committee Member) Subjects: Computer Science
  • 19. Gao, Xiaoyang Integrated compiler optimizations for tensor contractions

    Doctor of Philosophy, The Ohio State University, 2008, Computer and Information Science

    This dissertation addresses several performance optimization issues in the context of the Tensor Contraction Engine (TCE), a domain-specific compiler to synthesize parallel, out-of-core programs for a class of scientific computations encountered in computational chemistry and physics. The domain of our focus is electronic structure calculations, where many computationally intensive components are expressible as a set of tensor contractions. These scientific applications are extremely compute-intensive and consume significant computer resources at national supercomputer centers. The manual development of high-performance parallel programs for them is usually very tedious and time consuming. The TCE system is targeted at reducing the burden on application scientists, by having them specify computations in a high-level form, from which efficient parallel programs are automatically synthesized. The goal of this research is to develop an optimization framework to derive high-performance implementations for a set of given tensor contractions. In particular, the issues investigated include: 1) Development of an efficient in-memory parallel algorithm for a tensor contraction. 2) Design of a performance-model driven framework for a parallel out-of-core tensor contraction. 3) Design of an integrated optimization framework, which incorporating multiple high-level program transformation techniques to improve locality and parallelism for a set of tensor contractions. This subject includes the following three topics: 3.1) Modeling interactions between different transformations and assess their impact on the overall performance, 3.2) Using a search-based strategy to explore the combined space of transformations, and provide efficient pruning methods to reduce the search space, 3.3) Using performance-driven models to identify the best combination and generate high-performance code. 4) Incorporation of domain-specific optimizations, such as sparse and symmetry. We have implemented a (open full item for complete abstract)

    Committee: Ponnuswamy Sadayappan (Advisor) Subjects: Computer Science
  • 20. Li, Xiaogang Efficient and parallel evaluation of XQuery

    Doctor of Philosophy, The Ohio State University, 2006, Computer and Information Science

    With the increased popularity of XML, query and processing of XML data has become a very important topic. Most recent work in this area has been in the context of XQuery, which is the XML query language developed by the World Wide Web Consortium (W3C). This dissertation presents our approach for efficient compilation of XQuery queries to facilitate the development of data intensive applications. As XML and XQuery are being used for larger datasets, parallel execution and stream processing are two solutions to reduce storage and/or the execution time. Accordingly, our efforts are focused on optimization of XQuery and generating efficient code in a cluster and streaming environment. Particularly, the issues that we investigate include: 1)efficient optimization of XQuery by designing new analysis and transformation techniques, as well as integrating existing compiler optimization and query optimization techniques; 2)Designing new techniques toward efficient parallelization of XQuery; 3) Providing high-level abstraction of a dataset to an application developer through XML Schemas and 4) Code generation of XQuery towards the desired targets, such as clusters and streaming environment.

    Committee: Gagan Agrawal (Advisor) Subjects: Computer Science