Doctor of Philosophy, The Ohio State University, 2010, Computer Science and Engineering
Recent advances in digital sensor technology and numerical simulations
of real-world phenomena are resulting in the acquisition of unprecedented
amounts of raw digital data. Terms like ‘data explosion' and ‘data tsunami'
have come to describe the uncontrolled rate at which scientific datasets
are generated by automated sources ranging from digital microscopes and
telescopes to in-silico models simulating the complex dynamics of physical
and biological processes. Scientists in various domains now have secure,
affordable access to petabyte-scale observational data gathered over time,
the analysis of which, is crucial to scientific discovery.
The availability of commodity components have fostered the development of
large distributed systems with high-performance computing resources to
support the execution requirements of scientific data analysis applications.
Increased levels of middleware support over the years have aimed to provide
high scalability of application execution on these systems. However, the
high-resolution, multi-dimensional nature of scientific datasets, and the
complexity of analysis requirements present challenges to efficient
application execution on such systems. Traditional brute-force analysis
techniques to extract useful information from scientific datasets
may no longer meet desired performance levels at extreme data scales.
This thesis builds on a comprehensive study involving multi-dimensional data
analysis applications at large data scales, and identifies a set of advanced
factors or parameters to this class of applications which can be customized
in domain-specific ways to obtain substantial improvements in performance.
A useful property of these applications
is their ability to operate at multiple performance levels based on a set of
trade-off parameters, while providing different levels of quality-of-service
(QoS) specific to the application instance. To avail the performance benefits
brought about by such facto (open full item for complete abstract)
Committee: P Sadayappan PhD (Advisor); Joel Saltz MD, PhD (Committee Member); Gagan Agrawal PhD (Committee Member); Umit Catalyurek PhD (Committee Member)
Subjects: Computer Science