Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Statistical and Machine Learning Approaches For Visualizing and Analyzing Large-Scale Simulation Data

Hazarika, Subhashis

Abstract Details

2019, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Recent advancements in the field of computational sciences and high-performance computing have enabled scientists to design high-resolution computational models to simulate various real-world physical phenomenon. In order to gain key scientific insights about the underlying phenomena it is important to analyze and visualize the output data produced by such simulations. However, large-scale scientific simulations often produce output data whose size can range from a few hundred gigabytes to the scale of terabytes or even petabytes. Analyzing and visualizing such large-scale simulation data is not trivial. Moreover, scientific datasets are often multifaceted (multivariate, multi-run, multi-resolution, etc.), which can introduce additional complexities to the analyses and visualization activities. This dissertation addresses three broad categories of data analysis and visualization challenges: (i) multivariate distribution-based data summarization, (ii) uncertain analysis in ensemble simulation data, and (iii) simulation parameter analysis and exploration. We proposed statistical and machine learning-based approaches to overcome these challenges. A common strategy to deal with large-scale simulation data is to partition the simulation domain and create data summaries in the form of statistical probability distributions. Instead of storing high-resolution raw data, storing the compact statistical data summaries results in reduced storage overhead and alleviated I/O bottleneck issues. However, for multivariate simulation data using standard multivariate distributions for creating data summaries is not feasible. Therefore, we proposed a flexible copula-based multivariate distribution modeling strategy to create multivariate data summaries during simulation execution time (i.e, in-situ data modeling). The resulting data summaries can be subsequently used to perform scalable post-hoc analysis and visualization. In many cases, scientists execute their simulations multiple times with different initial conditions and/or input parameters in order to model the underlying uncertainty of the physical phenomena. Analyzing this collection of simulation outputs, generally referred to as ensemble simulation data, can be overwhelming for the scientists. To this end, we proposed a copula-based approach to model uncertainty in ensemble simulations using mixed statistical distribution models and preserving the spatial correlation among local neighbors. We utilize this statistical model to extract and visualize the uncertainty of features like isocontours and vortices in ensemble simulation data. Moreover, to guide the users in identifying interesting features for further uncertainty analysis, we proposed a two-stage information-theoretic framework for the exploration of scalar value ranges as well as the corresponding ensemble isocontours of selected values. Finally, for many newly designed simulation models, it is important to properly calibrate the simulation input parameters before applying them in real scientific studies. For computationally expensive simulations, performing such exploratory parameter analyses can become computationally prohibitive operations. Therefore, we proposed a neural network assisted visual analysis framework to enable interactive simulation parameter analysis. A trained neural network acts as a surrogate model, replacing the expensive simulation, to facilitate interactive exploratory analysis. We collaborated with computational biologists to assist them in analyzing an expensive yeast cell polarization simulation model.
Han-Wei Shen (Advisor)
Rephael Wenger (Committee Member)
Yusu Wang (Committee Member)
190 p.

Recommended Citations

Citations

  • Hazarika, S. (2019). Statistical and Machine Learning Approaches For Visualizing and Analyzing Large-Scale Simulation Data [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1574692702479196

    APA Style (7th edition)

  • Hazarika, Subhashis. Statistical and Machine Learning Approaches For Visualizing and Analyzing Large-Scale Simulation Data. 2019. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1574692702479196.

    MLA Style (8th edition)

  • Hazarika, Subhashis. "Statistical and Machine Learning Approaches For Visualizing and Analyzing Large-Scale Simulation Data." Doctoral dissertation, Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1574692702479196

    Chicago Manual of Style (17th edition)