Ensemble simulations are becoming prevalent in various scientific and engineering disciplines, such as computational fluid dynamics, aerodynamics, climate, and weather research. Scientists routinely conduct a set of simulations with different configurations (e.g., initial/boundary conditions, parameter settings, or phenomenological models) and produce an ensemble of simulation outputs, namely an ensemble dataset. Ensemble datasets are extremely useful in studying the uncertainty of the simulation models and the sensitivities of the initial conditions and parameters. However, compared with deterministic scientific simulation data, visualizing and analyzing ensemble datasets are challenging because the ensemble datasets introduce extra dimensions into the field data (i.e., each spatial location is associated with multiple possible values instead of a deterministic value) and extra facets (e.g., simulation parameters).
Over the last decade, various approaches have been proposed to visualize and analyze ensemble datasets from different perspectives. For example, the variability of isocontours is modeled and visualized by a collection of techniques. Coordinated multiple views are frequently used to visualize the simulation parameters and outputs simultaneously and linked together to study the influence of different simulation parameters. However, to handle different types of ensemble datasets (e.g., unstructured grid data, time-varying data, and extreme-scale data) and address various visualization tasks (e.g., uncertainty modeling and parameter space exploration), more work needs to be done in terms of ensemble data visualization and analysis.
In this dissertation, we focus on visual exploration and analysis of ensemble datasets using statistical and deep learning models. Specifically, we explore and analyze ensemble datasets from three perspectives. First, we focus on modeling and visualizing the variability of ensemble members for 1) features (e.g., isosurfaces) derived from the field datasets and 2) raw simulation outputs (i.e., field datasets). For the derived features especially surface-like features (e.g., isosurfaces and streamsurfaces), we propose a statistical approach to study the variability of features. We model the positional uncertainty of the surfaces by extending kernel density estimate (KDE) from discrete data points to the infinite set of points on the input surfaces. For the field datasets, we focus on modeling and visualizing the variability of scalar values at each spatial location. To this end, we treat the scalar value at each spatial location as an independent random variable and model the distribution of the random variable using KDE. To visualize and explore the distributions, we decompose and summarize the distributions over a few representative subranges by cumulative probabilities. Then we build a visual interface to explore and analyze the field of distributions based on the cumulative probabilities over subranges. Second, we study the influence of different initial conditions, parameterizations, and/or phenomenological models of simulations on the simulation outputs. To model the mapping between simulation inputs and outputs, which are often highly complex and nonlinear functions, we embrace the emerging deep learning techniques. With the assistance of trained deep learning models, users can predict the visualization of simulation outputs given arbitrary simulation inputs and study the sensitivity of different simulation parameters through interactive explorations. Third, we extend our study from visualizing and analyzing members within an ensemble to comparing multiple ensembles, for which we focus on studying how, when, and where the ensembles agree/disagree with each other. To this end, we train a discriminative network to differentiate members of one ensemble from members of the other. After training, we can use the discriminative network to approximate the overall difference between the two ensembles and identify members and spatial locations the two ensembles agree/disagree with each other. This dissertation ends with potential future research directions, as well as a summary of our contributions.