Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Jiayi_Xu_PhD_Dissertation.pdf (22.99 MB)
ETD Abstract Container
Abstract Header
Scalable Extraction and Visualization of Scientific Features with Load-Balanced Parallelism
Author Info
Xu, Jiayi
ORCID® Identifier
http://orcid.org/0000-0002-9091-6412
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu163875260837876
Abstract Details
Year and Degree
2021, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Abstract
Extracting and visualizing features from scientific data can help scientists derive valuable insights. An extraction and visualization pipeline usually includes three steps: (1) scientific feature detection, (2) union-find for features' connected component labeling, and (3) visualization and analysis. As the scale of scientific data generated by experiments and simulations grows, it becomes a common practice to use distributed computing to handle large-scale data with data-parallelism, where data is partitioned and distributed over parallel processors. Three challenges arise for feature extraction and visualization on scientific applications. First, traditional feature detectors may not be effective and robust enough to capture features of interest across different scientific settings, because scientific features usually are highly nonlinear and recognized by domain scientists' soft knowledge. Second, existing union-find algorithms are either serial or not scalable enough to deal with extreme-scale datasets generated in the modern era. Third, existing parallel feature extraction and visualization algorithms fail to automatically reduce communication costs when optimizing the performance of processing units. This dissertation studies scalable scientific feature extraction and visualization to tackle the three challenges. First, we design human-centric interactive visual analytics based on scientists' requirements to address domain-specific feature detection and tracking. We focus on an essential problem in earth sciences: spatiotemporal analysis of viscous and gravitational fingers. Viscous and gravitational flow instabilities cause a displacement front to break up into finger-like fluids. Previously, scientists mainly detected the finger features using density thresholding, where scientists specify certain density thresholds and extract super-level sets from input density scalar fields. However, the results of density thresholding are sensitive to the selected threshold values, and a few single threshold values are usually not sufficient to extract and track satisfied time-varying finger features. In our study, scientists can detect and visualize spatiotemporal fingers interactively to elucidate the dynamics of the flow instabilities. Our study has two main contributions. (1) We propose a ridge-guided detection to extract curvilinear geometry and branching topology of fingers, which provides richer geometric structures than the density thresholding. (2) We devise an interactive visual-analytics system with geometric-glyph augmented tracking graphs to allow scientists to navigate how the fingers and their branches grow, merge, and split over both space and time. Feedback from earth scientists demonstrates the efficacy of our approach for spatiotemporal geometry-driven analyses of fingers. Second, we improve the scalability of union-find algorithms using asynchronous and load-balanced parallelism. Union-find is widely used in scientific feature extraction and visualization techniques, such as tracking critical points and extracting level sets. However, distributed and parallel union-find can suffer from high synchronization costs and imbalanced workloads of participating processors. In our study, we present a novel distributed union-find algorithm that features asynchronous parallelism and k-d tree based load balancing for scalable scientific feature extraction and visualization. We prove that global synchronizations in existing distributed union-find can be eliminated without changing final results, allowing overlapped communications and computations for scalable processing. We also use a k-d tree decomposition to redistribute inputs in order to improve workload balancing. We benchmark the scalability of our algorithm with up to 1,024 processors using both synthetic and application data. We demonstrate the use of our algorithm in critical point tracking and super-level set extraction with high-speed imaging experiments and fusion plasma simulations, respectively. Third, we take communication costs into account of parallel algorithm design. We explore an online reinforcement learning (RL) paradigm to optimize parallel particle tracing performance dynamically in distributed-memory systems with the reduction of I/O and communication costs. Our method combines three novel components: (1) a workload donation model, (2) a high-order workload estimation model, and (3) a communication cost model. First, our RL-based workload donation model monitors the workloads of processors and creates RL agents to donate particles and data blocks from high-workload processors to low-workload processors to minimize the execution time. The RL agents learn the donation strategy on-the-fly based on reward and cost functions. The reward and cost functions are designed to consider processors' workload changes and data transfer costs for every donation action. Second, we propose an online workload estimation model to help our RL model estimate the workload distribution of processors in future computations. Third, we use the communication cost model that considers both block and particle data exchange costs to help the agents make effective decisions with minimized communication costs. We demonstrate that our algorithm adapts to different flow behaviors in large-scale fluid dynamics, ocean, and weather simulation data. Our algorithm improves parallel particle tracing performance in terms of parallel efficiency, load balance, and costs of I/O and communication for evaluations up to 16,384 processors.
Committee
Han-Wei Shen (Advisor)
Rephael Wenge (Committee Member)
Jian Chen (Committee Member)
Pages
184 p.
Subject Headings
Computer Engineering
;
Computer Science
Keywords
scientific visualization
;
feature extraction
;
feature visualization
;
spatiotemporal analysis
;
distributed and parallel computing
;
dynamic load balancing
;
asynchronous parallelism
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Xu, J. (2021).
Scalable Extraction and Visualization of Scientific Features with Load-Balanced Parallelism
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu163875260837876
APA Style (7th edition)
Xu, Jiayi.
Scalable Extraction and Visualization of Scientific Features with Load-Balanced Parallelism.
2021. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu163875260837876.
MLA Style (8th edition)
Xu, Jiayi. "Scalable Extraction and Visualization of Scientific Features with Load-Balanced Parallelism." Doctoral dissertation, Ohio State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=osu163875260837876
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu163875260837876
Download Count:
98
Copyright Info
© 2021, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.