Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Zhenhuan Sui Dissertation v29.pdf (1.62 MB)
ETD Abstract Container
Abstract Header
Hierarchical Text Topic Modeling with Applications in Social Media-Enabled Cyber Maintenance Decision Analysis and Quality Hypothesis Generation
Author Info
SUI, ZHENHUAN
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1499446404436637
Abstract Details
Year and Degree
2017, Doctor of Philosophy, Ohio State University, Industrial and Systems Engineering.
Abstract
Many decision problems are set in changing environments. For example, determining the optimal investment in cyber maintenance depends on whether there is evidence of an unusual vulnerability such as “Heartbleed” that is causing an especially high rate of incidents. This gives rise to the need for timely information to update decision models so that the optimal policies can be generated for each decision period. Social media provides a streaming source of relevant information, but that information needs to be efficiently transformed into numbers to enable the needed updates. This dissertation first explores the use of social media as an observation source for timely decision-making. To efficiently generate the observations for Bayesian updates, the dissertation proposes a novel computational method to fit an existing clustering model, called K-means Latent Dirichlet Allocation (KLDA). The method is illustrated using a cyber security problem related to changing maintenance policies during periods of elevated risk. Also, the dissertation studies four text corpora with 100 replications and show that KLDA is associated with significantly reduced computational times and more consistent model accuracy compared with collapsed Gibbs sampling. Because social media is becoming more popular, researchers have begun applying text analytics models and tools to extract information from these social media platforms. Many of the text analytics models are based on Latent Dirichlet Allocation (LDA). But these models are often poor estimators of topic proportions for emerging topics. Therefore, the second part of dissertation proposes a visual summarizing technique based on topic models, a point system, and Twitter feeds to support passive summarizing and sensemaking. The associated “importance score” point system is intended to mitigate the weakness of topic models. The proposed method is called TWitter Importance Score Topic (TWIST) summarizing method. TWIST employs the topic proportion outputs of tweets and assigns importance points to present trending topics. TWIST generates a chart showing the important and trending topics that are discussed over a given time period. The dissertation illustrates the methodology using two cyber-security field case study examples. Finally, the dissertation proposes a general framework to teach the engineers and practitioners how to work with text data. As an extension of Exploratory Data Analysis (EDA) in quality improvement problems, Exploratory Text Data Analysis (ETDA) implements text as the input data and the goal is to extract useful information from the text inputs for exploration of potential problems and causal effects. This part of the dissertation presents a practical framework for ETDA in the quality improvement projects with four major steps of ETDA: pre-processing text data, text data processing and display, salient feature identification, and salient feature interpretation. For this purpose, various case studies are presented alongside the major steps and tried to discuss these steps with various visualization techniques available in ETDA.
Committee
Theodore Allen (Advisor)
Steven MacEachern (Committee Member)
Cathy Xia (Committee Member)
Nena Couch (Other)
Pages
126 p.
Subject Headings
Finance
;
Industrial Engineering
;
Operations Research
;
Statistics
;
Systems Science
Keywords
Natural Language Processing, NLP, Machine Learning, Bayesian Statistics, Hierarchical Text Topic Modeling, Text Analytics, Cyber Maintenance, Decision Analysis, Quality Hypothesis Generation, Latent Dirichlet Allocation, Financial Engineering
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
SUI, Z. (2017).
Hierarchical Text Topic Modeling with Applications in Social Media-Enabled Cyber Maintenance Decision Analysis and Quality Hypothesis Generation
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1499446404436637
APA Style (7th edition)
SUI, ZHENHUAN.
Hierarchical Text Topic Modeling with Applications in Social Media-Enabled Cyber Maintenance Decision Analysis and Quality Hypothesis Generation.
2017. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1499446404436637.
MLA Style (8th edition)
SUI, ZHENHUAN. "Hierarchical Text Topic Modeling with Applications in Social Media-Enabled Cyber Maintenance Decision Analysis and Quality Hypothesis Generation." Doctoral dissertation, Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1499446404436637
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1499446404436637
Download Count:
819
Copyright Info
© 2017, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.