Doctor of Philosophy, The Ohio State University, 2011, Industrial and Systems Engineering
Engineers face many quality-related datasets containing free-style text or images. For example, a database could include summaries of complaints filed by customers, or descriptions of the causes of rework or maintenance or of the associated actions taken, or a collection of quality inspection images of welded tubes. The goal of this dissertation is to enable engineers to input a database of free-style text or image data and then obtain a set of clusters or “topics” with intuitive definitions and information about the degree of commonality that together helps prioritize system improvement. The proposed methods generate Pareto charts of ranked clusters or topics with their interpretability improved by input from the analyst or method user. The combination of subject matter expert data with standard data is the novel feature of the methods considered. Prior to the methods proposed here, analysts applied Bayesian mixture models and had limited recourse if the cluster or topic definitions failed to be interpretable or are at odds with the knowledge of subject matter experts.
The associated “Subject Matter Expert Refined Topic” (SMERT) model permits on-going knowledge elicitation and high-level human expert data integration to address the issues regarding: (1) unsupervised topic models often produce results to user, and (2) to provide a “Hierachical Analysis Designed Latency Experiment” (HANDLE) for human expert to interact with the model results. If grouping are missing key elements, so-called “boosting” these elements is possible. If certain members of a cluster are nonsensical or nonphysical, so-called “zapping” these nonsensical elements is possible. We also describe a fast Collapsed Gibbs Sampling (CGS) algorithm for SMERT method, which offers the capacity to efficiently SMERT model large datasets but which is associated with approximations in certain cases.
We use three case studies to illustrate the proposed methods. The first relates to scrap text reports for a Ch (open full item for complete abstract)
Committee: Theodore Allen PhD (Advisor); Suvrajeet Sen PhD (Committee Member); David Woods PhD (Committee Member)
Subjects: Computer Science; Engineering; Industrial Engineering; Information Technology