Search Results

1. Bokhari, Saniyah Parallel Solution of the Subset-sum Problem: An Empirical Study

Master of Science, The Ohio State University, 2011, Computer Science and Engineering

We investigate the parallelization of an algorithm on three very different architectures. These are: a 128-processor Cray XMT massively multithreaded machine, a 16-processor IBM x3755 shared memory machine and a 240-core NVIDIA FX 5800 graphics processor unit (GPU). The problem we use in our investigation is the well-known subset-sum problem. While this is known to be NP-complete, it is solvable in pseudo-polynomial time, i.e., time proportional to the number of input objects multiplied by the sum of their sizes. This product defines the size of the dynamic programming table used to solve the problem. The hypothesis that we wish to test is that the Cray, with its specialized hardware and large uniform shared memory, is suitable for very large problems, the IBM x3755 is suitable for intermediate sized problems and the NVIDIA FX 5800 can give superior performance only for problems that fit within its modest internal memory. We show that it is straightforward to parallelize this algorithm on the Cray XMT primarily because of the word-level locking that is available on this architecture. For the other two machines we present an alternating word algorithm that can implement an efficient solution. The timings of our respective codes were carefully measured over a comprehensive range of problem sizes. On the Cray XMT we observe very good scaling for large problems and see sustained performance as the problem size increases. However this machine has poor scaling for small problem sizes; it performs best for problem sizes of 1012 bits or more. The IBM x3755 performs very well on medium sized problems, but has poor scalability as the number of processors increases and is unable to sustain performance as the problem size increases. This machine tends to saturate for problem sizes of 1011 bits. The NVIDIA GPU performs well for problems whose tables t within its 4GB device memory. This corresponds to tables of size approximately 1010/. The experimental measurements support ou (open full item for complete abstract)

Committee: Ten H. Lai PhD (Advisor); Dong Xuan PhD (Committee Member) Subjects: Computer Engineering; Computer Science

2. Morrison, Adrian An Efficient Method for Computing Excited State Properties of Extended Molecular Aggregates Based on an Ab-Initio Exciton Model

Doctor of Philosophy, The Ohio State University, 2017, Chemistry

In this work, we outline the development, testing, and application of a novel electronic structure method for computing the properties of excited states of molecular aggregates. The method is an ab-inito realization of the molecular exciton model, proposed a long time ago by Frenkel and Davydov to describe excited states of molecular crystals, and is called the Ab-Initio Frenkel Davydov Exciton Model (AIFDEM). The AIFDEM ansatz follows the traditional exciton model by expanding the supersystem excited state wavefunction as a linear combination of excitations that are localized on the component molecules. Our method is a truly ab-inito implementation of this model as the requisite fragment excited states and the exciton Hamiltonian matrix are computed rigorously, including exact Coulomb and Hartree-Fock exchange interactions, without any neglect of overlap, nearest neighbor, or other common approximations. We have tested this method and found that it can reproduce excitation energies of water clusters, DNA bases, and organic chromophores within ~0.1 eV. A charge embedding scheme is able to reduce the scaling of this method to only quadratic with the number of fragments and provides near perfect parallel performance without reducing the accuracy, significantly outperforming traditional approaches. The method was utilized to investigate the excitation energy transfer dynamics of a napthalene-diimide nanotube where it was found that model systems beyond the scope of traditional methods are necessary for a fully detailed mechanistic picture, including the role of quantum coherence. Analytic derivatives of the AIFDEM Hamiltonian are derived and implemented and these provide access to non-adiabatic couplings as well as Holstein and Peierls electron-phonon coupling constants. This is applied to the challenging electronic structure of the singlet exciton fission process to identify vibrational modes key to the mechanism. Dynamics simulations, using parameters computed via th (open full item for complete abstract)

Committee: Sherwin Singer (Advisor); Heather Allen (Committee Member); Terry Gustafson (Committee Member) Subjects: Physical Chemistry

3. Slatton, Andrew Scalable Algorithms for Delaunay Mesh Generation

Doctor of Philosophy, The Ohio State University, 2014, Computer Science and Engineering

Delaunay refinement is a useful tool for sampling and meshing. Pioneered by Ruppert and Chew for piecewise-linear complexes in R^2, Delaunay refinement has been extended to myriad classes of shapes including smooth 1- and 2-manifolds, volumes bounded by smooth 2-manifolds, and piecewise-smooth complexes. Delaunay refinement algorithms often carry certain guarantees regarding the geometric and topological closeness of output to input, as well as guarantees of the quality of mesh elements, making meshes generated via Delaunay refinement a natural choice for simulation and rendering. A major shortcoming of Delaunay refinement is its scalability: as the size of the mesh grows, the data structures necessary to carry out Delaunay refinement efficiently (such as the Delaunay triangulation and its dual, the Voronoi diagram) also grow, and this incurs memory thrashing when generating dense meshes. In this dissertation, we improve Delaunay refinement in two main capacities: (1) we improve the memory scalability of Delaunay refinement to allow for the generation of truly huge meshes, and (2) we improve the time scalability of Delaunay refinement by developing a novel parallel algorithm. To address the issue of memory scalability, we developed a localized refinement method embodying a divide-and-conquer paradigm for meshing smooth surfaces. The algorithm divides the sampling of the input domain via octree and meshes one node of the octree at a time, thus reducing memory requirements. Our theoretical results show that the algorithm terminates, and that the output is not only consistent, but also is geometrically close to the input and is a subcomplex of the Delaunay triangulation of the sampling restricted to the input surface. Our initial work nicely addresses the aforementioned shortcoming of Delaunay refinement, but only for smooth 2-manifolds. In later work, we extended this technique to another important input class: volumes bounded by smooth 2-manifolds. It is not im (open full item for complete abstract)

Committee: Tamal Dey (Advisor); Yusu Wang (Committee Member); Rephael Wenger (Committee Member) Subjects: Computer Science

4. Giridhar, Nandipati Kinetic Monte Carlo simulations of submonolayer and multilayer epitaxial growth over extended time- and length-scales

Doctor of Philosophy, University of Toledo, 2009, Physics

The main objective of the work presented in this thesis is to develop new methods to extend the time and length scales of atomistic kinetic Monte Carlo (KMC) simulations. When all the relevant processes and their activation barriers are known, KMC is an extremely efficient method to carry out atomistic simulations for longer time scales. However, in some cases (ex. low barrier repetitive events) direct KMC simulations may not be sufficient to reach the experimentally relevant length and time scales. Accordingly, we have tested and developed several different parallel KMC algorithms and also developed a dynamic boundary allocation (DBA) method to improve parallel efficiency by reducing number of boundary events. Results for parallel KMC simulations of Ag(111) island coarsening at room temperature carried out using a large database of processes obtained from previous self-learning KMC simulations are also presented. We find that at long times the coarsening behavior corresponds to Ostwald ripening. We also find that the inclusion of concerted small-cluster events has a significant impact on the average island size. In addition, we have also developed a first passage time (FPT) approach to KMC simulations to accelerate KMC simulation of (100) multilayer epitaxial growth with rapid edge diffusion. In our FPT approach, by mapping edge-diffusion to a 1D random walk, numerous diffusive hops are replaced with first-passage time to make one large jump to a new location. As a test, we have applied our method to carry out multilayer growth simulations of three different models. We note that despite the additional overhead, the FPT approach leads to a significant speed-up compared to regular KMC simulations Finally, we present results obtained from KMC simulations of irreversible submonolayer island growth with strain and rapid island relaxation. Our results indicate that in the presence of large strain there is significant anisotropy in qualitative agreement with experiments o (open full item for complete abstract)

Committee: Amar Jacques (Advisor); Amar Jacques (Advisor); Collins Robert (Committee Member); Bigioni Terry (Committee Member); Anderson-huang Lawrence (Committee Member); Kvale Thomas (Committee Member) Subjects: Physics

5. Gao, Xiaoyang Integrated compiler optimizations for tensor contractions

Doctor of Philosophy, The Ohio State University, 2008, Computer and Information Science

This dissertation addresses several performance optimization issues in the context of the Tensor Contraction Engine (TCE), a domain-specific compiler to synthesize parallel, out-of-core programs for a class of scientific computations encountered in computational chemistry and physics. The domain of our focus is electronic structure calculations, where many computationally intensive components are expressible as a set of tensor contractions. These scientific applications are extremely compute-intensive and consume significant computer resources at national supercomputer centers. The manual development of high-performance parallel programs for them is usually very tedious and time consuming. The TCE system is targeted at reducing the burden on application scientists, by having them specify computations in a high-level form, from which efficient parallel programs are automatically synthesized. The goal of this research is to develop an optimization framework to derive high-performance implementations for a set of given tensor contractions. In particular, the issues investigated include: 1) Development of an efficient in-memory parallel algorithm for a tensor contraction. 2) Design of a performance-model driven framework for a parallel out-of-core tensor contraction. 3) Design of an integrated optimization framework, which incorporating multiple high-level program transformation techniques to improve locality and parallelism for a set of tensor contractions. This subject includes the following three topics: 3.1) Modeling interactions between different transformations and assess their impact on the overall performance, 3.2) Using a search-based strategy to explore the combined space of transformations, and provide efficient pruning methods to reduce the search space, 3.3) Using performance-driven models to identify the best combination and generate high-performance code. 4) Incorporation of domain-specific optimizations, such as sparse and symmetry. We have implemented a (open full item for complete abstract)

Committee: Ponnuswamy Sadayappan (Advisor) Subjects: Computer Science

6. Xu, Yi-Chang Parallel thinning algorithms and their implementation on hypercube machine

Master of Science (MS), Ohio University, 1991, Electrical Engineering & Computer Science (Engineering and Technology)

Parallel thinning algorithms and their implementation on hypercube machine

Committee: V.S. Raju (Advisor) Subjects:

Basic Search

Left Column

Filters

Right Column

Search Results

Search Results

Mini-Tools

Search Report

1. Bokhari, Saniyah Parallel Solution of the Subset-sum Problem: An Empirical Study

2. Morrison, Adrian An Efficient Method for Computing Excited State Properties of Extended Molecular Aggregates Based on an Ab-Initio Exciton Model

3. Slatton, Andrew Scalable Algorithms for Delaunay Mesh Generation

4. Giridhar, Nandipati Kinetic Monte Carlo simulations of submonolayer and multilayer epitaxial growth over extended time- and length-scales

5. Gao, Xiaoyang Integrated compiler optimizations for tensor contractions

6. Xu, Yi-Chang Parallel thinning algorithms and their implementation on hypercube machine

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Basic Search

Left Column

Filters

By Year

Degree Name

Submission Site

Subject

Language

Right Column

Search Results

Search Results

Mini-Tools

Search Report

1. Bokhari, Saniyah Parallel Solution of the Subset-sum Problem: An Empirical Study

2. Morrison, Adrian An Efficient Method for Computing Excited State Properties of Extended Molecular Aggregates Based on an Ab-Initio Exciton Model

3. Slatton, Andrew Scalable Algorithms for Delaunay Mesh Generation

4. Giridhar, Nandipati Kinetic Monte Carlo simulations of submonolayer and multilayer epitaxial growth over extended time- and length-scales

5. Gao, Xiaoyang Integrated compiler optimizations for tensor contractions

6. Xu, Yi-Chang Parallel thinning algorithms and their implementation on hypercube machine

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links