Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 9)

Mini-Tools

 
 

Search Report

  • 1. Li, Haoyu Efficient Visualization for Machine-Learning-Represented Scientific Data

    Doctor of Philosophy, The Ohio State University, 2024, Computer Science and Engineering

    Recent progress in high-performance computing now allows researchers to run extremely high-resolution computational models, simulating detailed physical phenomena. Yet, efficiently analyzing and visualizing the extensive data from these simulations is challenging. Adopting machine learning models to reduce the storage cost of or extract salient features from large scientific data has proven to be a successful approach to analyzing and visualizing these datasets effectively. Machine learning (ML) models like neural networks and Gaussian process models are powerful tools in data representation. They can capture the internal structures or ``features'' from the dataset, which is useful in compressing the data or exploring the subset of data that is of interest. However, applying machine learning models to scientific data brings new challenges to visualization. Machine learning models are usually computationally expensive. Neural networks are expensive to reconstruct on a dense grid representing a high-resolution scalar field and Gaussian processes are notorious for their cubic time complexity to the number of data points. If we consider other variables in the data modeling, for example, the time dimension and the simulation parameters in the ensemble data, the curse of dimensionality will make the computation cost even higher. The long inference time for the machine learning models puts us in a dilemma between the high data storage cost of the original data representation and the high computation cost of the machine learning representation. The above challenges demonstrate a great need for techniques and algorithms that increase the speed of ML model inference. Despite many generic efforts to increase ML efficiency, for example, using better hardware acceleration or designing more efficient architecture, we tackle a more specific problem of how to query the ML model more efficiently with a specific scientific visualization task. In this dissertation, we c (open full item for complete abstract)

    Committee: Han-Wei Shen (Advisor); Hanqi Guo (Committee Member); Raphael Wenger (Committee Member) Subjects: Computer Engineering; Computer Science
  • 2. Xu, Jiayi Scalable Extraction and Visualization of Scientific Features with Load-Balanced Parallelism

    Doctor of Philosophy, The Ohio State University, 2021, Computer Science and Engineering

    Extracting and visualizing features from scientific data can help scientists derive valuable insights. An extraction and visualization pipeline usually includes three steps: (1) scientific feature detection, (2) union-find for features' connected component labeling, and (3) visualization and analysis. As the scale of scientific data generated by experiments and simulations grows, it becomes a common practice to use distributed computing to handle large-scale data with data-parallelism, where data is partitioned and distributed over parallel processors. Three challenges arise for feature extraction and visualization on scientific applications. First, traditional feature detectors may not be effective and robust enough to capture features of interest across different scientific settings, because scientific features usually are highly nonlinear and recognized by domain scientists' soft knowledge. Second, existing union-find algorithms are either serial or not scalable enough to deal with extreme-scale datasets generated in the modern era. Third, existing parallel feature extraction and visualization algorithms fail to automatically reduce communication costs when optimizing the performance of processing units. This dissertation studies scalable scientific feature extraction and visualization to tackle the three challenges. First, we design human-centric interactive visual analytics based on scientists' requirements to address domain-specific feature detection and tracking. We focus on an essential problem in earth sciences: spatiotemporal analysis of viscous and gravitational fingers. Viscous and gravitational flow instabilities cause a displacement front to break up into finger-like fluids. Previously, scientists mainly detected the finger features using density thresholding, where scientists specify certain density thresholds and extract super-level sets from input density scalar fields. However, the results of density thresholding are sensitive to the select (open full item for complete abstract)

    Committee: Han-Wei Shen (Advisor); Rephael Wenge (Committee Member); Jian Chen (Committee Member) Subjects: Computer Engineering; Computer Science
  • 3. Hazarika, Subhashis Statistical and Machine Learning Approaches For Visualizing and Analyzing Large-Scale Simulation Data

    Doctor of Philosophy, The Ohio State University, 2019, Computer Science and Engineering

    Recent advancements in the field of computational sciences and high-performance computing have enabled scientists to design high-resolution computational models to simulate various real-world physical phenomenon. In order to gain key scientific insights about the underlying phenomena it is important to analyze and visualize the output data produced by such simulations. However, large-scale scientific simulations often produce output data whose size can range from a few hundred gigabytes to the scale of terabytes or even petabytes. Analyzing and visualizing such large-scale simulation data is not trivial. Moreover, scientific datasets are often multifaceted (multivariate, multi-run, multi-resolution, etc.), which can introduce additional complexities to the analyses and visualization activities. This dissertation addresses three broad categories of data analysis and visualization challenges: (i) multivariate distribution-based data summarization, (ii) uncertain analysis in ensemble simulation data, and (iii) simulation parameter analysis and exploration. We proposed statistical and machine learning-based approaches to overcome these challenges. A common strategy to deal with large-scale simulation data is to partition the simulation domain and create data summaries in the form of statistical probability distributions. Instead of storing high-resolution raw data, storing the compact statistical data summaries results in reduced storage overhead and alleviated I/O bottleneck issues. However, for multivariate simulation data using standard multivariate distributions for creating data summaries is not feasible. Therefore, we proposed a flexible copula-based multivariate distribution modeling strategy to create multivariate data summaries during simulation execution time (i.e, in-situ data modeling). The resulting data summaries can be subsequently used to perform scalable post-hoc analysis and visualization. In many cases, scientists execute their simulations mu (open full item for complete abstract)

    Committee: Han-Wei Shen (Advisor); Rephael Wenger (Committee Member); Yusu Wang (Committee Member) Subjects: Computer Science; Statistics
  • 4. Wang, Ko-Chih Distribution-based Summarization for Large Scale Simulation Data Visualization and Analysis

    Doctor of Philosophy, The Ohio State University, 2019, Computer Science and Engineering

    The advent of high-performance supercomputers enables scientists to perform extreme-scale simulations that generate millions of cells and thousands of time steps. Through exploring and analyzing the simulation outputs, scientists can gain a deeper understanding of the modeled phenomena. When the size of simulation output is small, the common practice is to simply move the data to the machines that perform post analysis. However, as the size of data grows, the limited bandwidth and capacity of networking and storage devices that connect the supercomputers to the analysis machine become a major bottleneck. Therefore, visualizing and analyzing large-scale simulation datasets are posing significant challenges. This dissertation addresses the big data challenge and suggests distribution-based in-situ techniques. The technique uses the same supercomputer resources to analyze the raw data and generate compact data proxies which use distribution to statistically summarize the raw data. Only the compact data proxies are moved to the post-analysis machine to overcome the bottleneck. Because the distribution-based data representation keeps the statistical data properties, it has the potential to facilitate flexible post-hoc data analysis and enable uncertainty quantification. We firstly focus on the problem of large data volume rendering on resource-limited post analysis machines. To tackle the limited I/O bandwidth and storage space challenge, distributions are used to summarize the data. When visualizing the data, importance sampling is proposed to draw a small number of samples and minimize the demand of computational power. The error of the proxies is quantified and visually presented to scientists by uncertainty animation. We also tackle the problem of error reduction when approximating the spatial information in distribution-based representations. The error could cause low visualization quality and hinder the data exploration. The basic distribution-based appro (open full item for complete abstract)

    Committee: Han-Wei Shen (Advisor) Subjects: Computer Engineering; Computer Science
  • 5. Chaudhuri, Abon Geometric and Statistical Summaries for Big Data Visualization

    Doctor of Philosophy, The Ohio State University, 2013, Computer Science and Engineering

    In recent times, the visualization and data analysis paradigm is adapting fast to keep up with the rapid growth in computing power and data size. Modern scientific simulations run at massive scale to produce huge datasets, which must be analyzed and visualized by the domain experts to continue innovation. In the presence of large-scale data, it is important to identify and extract the informative regions at an early stage so that the following analysis algorithms, which are usually memory and compute-intensive, can focus only on those regions. Transforming the raw data to a compact yet meaningful representation also helps to maintain the interactivity of the query and visualization of analysis results. In this dissertation, we propose a novel and general-purpose framework suitable for exploring large-scale data. We propose to use importance-based data summaries, which can substitute for the raw data to answer queries and drive visual exploration. Since the definition of importance is dependent on the nature of the data and the task at hand, we propose to use suitable statistical and geometric measures or combination of various measures to quantify importance and perform data reduction on scalar and vector field data. Our research demonstrates two instances of the proposed framework. The first instance applies to large number of streamlines computed from vector fields. We make the visual exploration of such data much easier compared to navigating through a cluttered 3D visualization of the raw data. In this case, we introduce a fractal dimension based metric called box counting ratio, which quantifies the geometric complexity of streamlines (or parts of streamlines) by their space-filling capacity. We utilize this metric to extract, organize and visualize streamlines of varying density and complexity hidden in large number of streamlines. The extracted complex regions from the streamlines represent the data summaries in this case. We organize and present them (open full item for complete abstract)

    Committee: Han-Wei Shen (Advisor); Roger Crawfis (Committee Member); Rephael Wenger (Committee Member); Tom Peterka (Committee Member) Subjects: Computer Science
  • 6. Lee, Teng-Yok Data Triage and Visual Analytics for Scientific Visualization

    Doctor of Philosophy, The Ohio State University, 2011, Computer Science and Engineering

    As the speed of computers continues to increase at a very fast rate, the size of data generated from scientific simulations has now reached petabytes ($10^{12}$ bytes) and beyond. Under such circumstances, no existing techniques can be used to perform effective data analysis at a full precision. To analyze large scale data sets, visual analytics techniques with effective summarization and flexible interface are crucial in assisting the exploration of data at different levels of detail. To improve data access efficiency, summarization and triage are important components for categorizing data items according to their saliency. This will allow the user to focus only on the relevant portion of data. In this dissertation, several visualization and analysis techniques are presented to facilitate the analysis of multivariate time-varying data and flow fields. For multivariate time-varying data sets, data items are categorized based on the values over time to provide an effective overview of the time-varying phenomena. From the similarity to the user-specified feature, dynamic phenomena across multiple variables in different spatial and temporal domains can be explored. To visualize flow fields, information theory is used to model the local flow complexity quantitatively. Based on the model, an information-aware visualization framework is designed to create images with different levels of visual focus according to the local flow complexity. By extending the measurement from object space to image space, visualization primitives can be further rearranged, leading to more effective visualization of salient flow features with less occlusion.

    Committee: Han-Wei Shen PhD (Advisor); Roger A. Crawfis PhD (Committee Chair); Raghu Machiraju PhD (Committee Chair) Subjects: Computer Science
  • 7. Woodring, Jonathan Visualization of Time-varying Scientific Data through Comparative Fusion and Temporal Behavior Analysis

    Doctor of Philosophy, The Ohio State University, 2009, Computer Science and Engineering

    Visualization of time-varying scientific and medical datatraditionally has been done through animation or a series of still frame renders. Animation and still frame comparison is only minimally sufficient, due to limitations, such as short term visual memory and the lack of analytical feedback, to effectively find and compare temporal trends. To improve time-varying analysis, several different visualization methods are described. For direct visual comparison of individual time steps, we introduce a rendering technique that fuses multiple time steps into single data, by projection and composition methods. This can be achieved through projection along time, and further generalized to high dimensional space-time projection. Furthermore, time volumes (or multivariate data) can be compared through composition and set operations. To aid in the understanding of comparative time volumes, focus+context animation is used to reveal features in the data, by utilizing human motion perceptual capabilities. In addition to comparative and highlighting techniques, we also provide the quantitative analysis of time-varying data via time behavior classification. We allow a user to visualize and explore their time-varying data as classes of multi-scale temporal trends. Also through the analysis of the time activity, we can also semi-automatically generate classifications (transfer functions) to be used in the visualization pipeline.

    Committee: Han-Wei Shen PhD (Advisor); Roger Crawfis PhD (Committee Member); Rick Parent PhD (Committee Member) Subjects: Computer Science
  • 8. Mehta, Sameep Realizing a feature-based framework for scientific data mining

    Doctor of Philosophy, The Ohio State University, 2006, Computer and Information Science

    The focus in the computational sciences has been on developing algorithms and tools to facilitate large scale realistic simulations of physical processes. These tools can also simulate the physical processes at very fine temporal and spatial resolutions, resulting in huge time-varying datasets. These datasets, if analyzed properly, hold a great potential for knowledge discovery. In this dissertation, a feature-based framework for analyzing scientific datasets is realized. The main components of the framework are: feature detection, feature classification, feature verification, and modeling the evolutionary behavior of the features. The usefulness of first three steps is shown on datasets originating from computational molecular dynamics. Modeling the evolutionary behavior of the features involves i)understanding the trajectory of an individual feature ii) discovering the change which features undergo due to interactions with other features and finally, understanding and deriving various spatio-temporal relationships among features. A rule based feature detection algorithm is presented to extract the defect structures from molecular dynamics datasets. A two-step shape-based classifier assigns label to the extracted feature. To distinguish actual features from the spurious ones, visualization techniques are employed. The feature extraction algorithm is robust in presence of noise and detects the same features in noisy and noise free datasets. Moreover, the algorithm is capable of in vivo processing of the data. Next, we describe an algorithm for extracting meaningful representation of object trajectories. We take into account the shape and the size of the object. The trajectory of a feature is represented by using physically meaningful parameters: linear velocity, angular velocity and scale parameters. Next, we present a scheme to discover critical events like merging, creation etc. The results are again presented on molecular dynamics and fluid flow datasets. Finally (open full item for complete abstract)

    Committee: Srinivasan Parthasarathy Raghu Machiraju (Advisor) Subjects: Computer Science
  • 9. Tejwani, Kamal An Extensible Graphical User Interface

    MS, Kent State University, 2008, College of Arts and Sciences / Department of Computer Science

    Traditionally, simulations, which require large computing resources, are run non-interactively. A typical process involves creating a text file describing the initial conditions and parameters for the simulation, and then submitting the simulation to a batch queue, and wait until there are enough resources available to run the simulation. The simulation runs based entirely on the input file provided, and outputs the results to disk for later examination. While this technique may be suitable for some forms of investigation but for others it can lead to a very inefficient use of resources. A better approach would be to allow the user to research, investigate, calibrate, and control long-running, resource-intensive applications at runtime – a process that is called Computational Steering. Most steering packages have a very restricted and complex interface making it confusing for the user to monitor and edit the parameters in their simulation or to even have knowledge of parameters they can modify during the simulation. It is next to impossible to understand and work with such an interface without having expertise in computer programming. This creates the need to have an interface of some kind that makes this interactive process simple, transparent & efficient. An Extensible GUI for computational steering attempts to solve this problem by providing a minimally invasive interface, which would allow users to not only see a list of the parameters they can interact with during a simulation but also to monitor the values of some parameters in their simulation and, if necessary, to edit the values of other parameters from local and remote machines without being a computer programming expert. This GUI interface is designed to work with CUMULVS (Collaborative User Migration, User Library for Visualization and Steering) package and the DisCOV Steering Library system designed in Dr. Ruttan & Dr. Farrell's lab by implementing relatively small modifications.

    Committee: Dr. Arden Ruttan PhD (Advisor); Dr. Paul Farrell PhD (Committee Member); Dr. Austin Melton Jr. PhD (Committee Member) Subjects: Computer Science