Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 4)

Mini-Tools

 
 

Search Report

  • 1. Mealey, Thomas Binary Recurrent Unit: Using FPGA Hardware to Accelerate Inference in Long Short-Term Memory Neural Networks

    Master of Science (M.S.), University of Dayton, 2018, Electrical Engineering

    Long Short-Term Memory (LSTM) is a powerful neural network algorithm that has been shown to provide state-of-the-art performance in various sequence learning tasks, including natural language processing, video classification, and speech recognition. Once an LSTM model has been trained on a dataset, the utility it provides comes from its ability to then infer information from completely new data. Due to the large complexity of LSTM models, the so-called inference stage of LSTM can require significant computing power and memory resources in order to keep up with a real-time workload. Many approaches have been taken to accelerate inference, from offloading computations to GPU or other specialized hardware, to reducing the number of computations and memory footprint required by compressing model parameters. This work takes a two-pronged approach to accelerating LSTM inference. First, a model compression scheme called binarization is identified to both reduce the storage size of model parameters and to simplify computations. This technique is applied to training LSTM for two separate sequence learning tasks, and it is shown to provide prediction performance comparable to the uncompressed model counterparts. Then, a digital processor architecture, called Binary Recurrent Unit (BRU), is proposed to accelerate inference for binarized LSTM models. Specifically targeted for FPGA implementation, this accelerator takes advantage of binary model weights and on-chip memory resources in order to parallelize LSTM inference computations. The BRU architecture is implemented and tested on a Xilinx Z7020 device clocked at 200 MHz. Inference computation time for BRU is evaluated against the performance of CPU and GPU inference implementations. BRU is shown to outperform CPU by as much as 39X and GPU by as much as 3.8X.

    Committee: Tarek Taha PhD (Advisor); Eric Balster PhD (Committee Member); Vijayan Asari PhD (Committee Member) Subjects: Computer Engineering; Computer Science; Electrical Engineering; Engineering
  • 2. XUE, Daqing Volume Visualization Using Advanced Graphics Hardware Shaders

    Doctor of Philosophy, The Ohio State University, 2008, Computer Science and Engineering

    Graphics hardware based volume visualization techniques have been the active research topic over the last decade. With the more powerful computation ability, the availability of large texture memory, and the high programmability, modern graphics hardware has been playing a more and more important role in volume visualization.In the first part of the thesis, we focus on the graphics hardware acceleration techniques. Particularly, we develop a fast X-Ray volume rendering technique using point-convolution. An X-ray image is generated by convolving the voxel projection in the rendering buffer with a reconstruction kernel. Our technique allows users to interactively view large datasets at their original resolutions on standard PC hardware. Later, an acceleration technique for slice based volume rendering (SBVR) is examined. By means of the early z-culling feature from the modern graphics hardware, we can properly set up the z-buffer from isosurfaces to gain significant improvement in rendering speed for SBVR. The high programmability of the graphics processing unit (GPU) incurs a great deal of research work on exploring this advanced graphics hardware feature. In the second part of the thesis, we first revisit the texture splat for flow visualization. We develop a texture splat vertex shader to achieve fast animated flow visualization. Furthermore, we develop a new rendering shader of the implicit flow. By careful tracking and encoding of the advection parameters into a three-dimensional texture, we achieve high appearance control and flow representation in real time rendering. Finally, we present an indirect shader synthesizer to combine different shader rendering effects to create a highly informative image to visualize the investigating data. One or more different shaders are associated with the voxels or geometries. The shader is resolved at run time to be selected for rendering. Our indirect shader synthesizer provides a novel method to control the appearance of the (open full item for complete abstract)

    Committee: Roger Crawfis PhD (Advisor); Raghu Machiraju PhD (Committee Member); Han-Wei Shen PhD (Committee Member) Subjects: Computer Science
  • 3. Schultek, Brian Design and Implementation of the Heterogeneous Computing Device Management Architecture

    Master of Science (M.S.), University of Dayton, 2014, Electrical Engineering

    In this thesis, a novel software architecture called the Heterogeneous Computing Device Management Architecture (HCDMA) is introduced. The HCDMA is designed to address the growing problem of PCIe based acceleration device management. This type of architecture is ideal for computational acceleration in environments where size, weight and power need to be balanced for high performance computing solutions. The HCDMA, when coupled with an external PCIe expansion chassis, fills the need for a flexible and scalable solution to this problem. By utilizing the HCDMA with external FPGA acceleration modules, there is an observed 4.16 times improvement over the industry standard software solution for JPEG2000 image compression as well as a 2.94 times improvement over software based Image Pre-Processing algorithm tool chain.

    Committee: Eric Balster Ph.D. (Advisor); Frank Scarpino Ph.D. (Committee Member); John Weber Ph.D. (Committee Member) Subjects: Computer Engineering; Electrical Engineering
  • 4. Haines, Wesley Acceleration of the Weather Research & Forecasting (WRF) Model using OpenACC and Case Study of the August 2012 Great Arctic Cyclone

    Master of Science, The Ohio State University, 2013, Atmospheric Sciences

    This work presents two research projects: the first is a project to boost the performance of a weather model by extending the Weather Research and Forecasting (WRF) Model programming code with new technology, OpenACC, a directive-based command language. Combined with a compatible compiler, this allows WRF to run select subroutines on accelerators such as the popular NVIDIA Tesla video cards. Preliminary results show a 1.2x speed-up of the overall model by simply adding around 20 lines of easy-to-understand compiler directives to just one of several WRF physics schemes. The latest hardware from NVIDIA and Intel may allow further speed-up in future work due to hardware design improvements and better memory management. This modified model is then used in a second project, a Case Study of the August 2012 "Great Arctic Cyclone" (3-13 August). This Case Study provides a simulation of the atmospheric processes that helped to intensify the storm and to assess whether the storm had a major effect on the dramatic decline of sea ice extent in August. Results show that interaction with an upper-level vortex and warm air advection near the surface assisted in the persistence of the cyclone. Strong surface winds assisted in a break-up of the already-melting sea ice. Existing literature points to upwelling of warm ocean water, and the combination of these conditions rapidly melted sea ice from above and below. The unprecedented loss of sea ice starting on 6 August 2012 and in the following two weeks, in addition to other influences, contributed to 2012's Arctic sea ice extent minimum surpassing that of 2007, reaching the lowest on record for the satellite era.

    Committee: David Bromwich (Advisor); Jay Hobgood (Committee Member); Jialin Lin (Committee Member) Subjects: Atmospheric Sciences; Climate Change; Computer Science