Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 19)

Mini-Tools

 
 

Search Report

  • 1. Fraser, Kimberly DETERMINING STRUCTURE AND GROWTH CHARACTERISTICS OF OXIDE HETEROSTRUCTURES THROUGH DEPOSITION AND DATA SCIENCE: TOWARDS SINGLE CRYSTAL BATTERIES

    Doctor of Philosophy, Case Western Reserve University, 2023, Materials Science and Engineering

    A deeper understanding of processing-structure relationships has been developed with the goal of building single crystal devices using pulsed laser deposition (PLD) and advancing the application of data science to materials science. The targeted device was a half-cell lithium-ion battery, where strontium ruthenate (SRO) is the current collector, lithium cobalt oxide (LCO) is the cathode, and lithium lanthanum titanate (LLTO) is the electrolyte. These were grown on a strontium titanate (STO) substrate. Through studies of the processing parameters and film characteristics, conditions to grow a single crystal LCO/SRO/STO heterostructure were revealed. While the addition of the electrolyte affected the single crystal structure and interfacial quality, underlying reasons have been illuminated to guide further development of multi-layer oxide heterostructures. An in-situ technique called reflection h igh energy electron diffraction (RHEED) is commonly coupled with PLD to provide information on structure-property relationships by recording the diffraction pattern of the film during growth. Traditionally, a small percentage of the data provided is used in analysis. Here data science techniques are applied, both supervised and unsupervised, to reveal additional information from the full data set. As a result, the sensitivity of the length of diffraction spots over other parameters (e.g., width or intensity) to growth characteristics has been uncovered, especially in later stages of growth where the data is dominated by the reflection from the film. Additionally, through unsupervised learning, a phase shift in the intensity oscillations of different RHEED spots was uncovered. Non-negative matrix factorization among other techniques was used to deconvolute information from different diffraction spots. It was revealed that (01) and (0-1) spots are better indicators of thin film growth characteristics especially in material systems that grow in layer-by-layer or step-flow mechan (open full item for complete abstract)

    Committee: Alp Sehirlioglu Dr. (Advisor); Xuan Gao Dr. (Committee Member); Roger French Dr. (Committee Member); Frank Ernst Dr. (Committee Member) Subjects: Chemical Engineering; Chemistry; Computer Science; Engineering; Materials Science; Statistics
  • 2. Groeger, Alexander Texture-Driven Image Clustering in Laser Powder Bed Fusion

    Master of Science (MS), Wright State University, 2021, Computer Science

    The additive manufacturing (AM) field is striving to identify anomalies in laser powder bed fusion (LPBF) using multi-sensor in-process monitoring paired with machine learning (ML). In-process monitoring can reveal the presence of anomalies but creating a ML classifier requires labeled data. The present work approaches this problem by printing hundreds of Inconel-718 coupons with different processing parameters to capture a wide range of process monitoring imagery with multiple sensor types. Afterwards, the process monitoring images are encoded into feature vectors and clustered to isolate groups in each sensor modality. Four texture representations were learned by training two convolutional neural network texture classifiers on two general texture datasets for clustering comparison. The results demonstrate unsupervised texture-driven clustering can isolate roughness categories and process anomalies in each sensor modality. These groups can be labeled by a field expert and potentially be used for defect characterization in process monitoring.

    Committee: Tanvi Banerjee Ph.D. (Advisor); Thomas Wischgoll Ph.D. (Committee Member); John Middendorf Ph.D. (Committee Member) Subjects: Computer Science; Materials Science
  • 3. Hussein, Abdul Aziz Identifying Crime Hotspot: Evaluating the suitability of Supervised and Unsupervised Machine learning

    MS, University of Cincinnati, 2021, Education, Criminal Justice, and Human Services: Information Technology

    Crime hotspot locations identification is a very important endeavor to help ensure public safety. Been able to identify these locations effectively and accurately will help provide useful information to law enforcement bodies to help minimize criminal activities. Considering the limited resources available to law enforcements, a more prudent approach will be to deploy these resources at places that record a considerable higher crime rate. We depart from the traditional “higher than” average thresholds and rather rely on a more pragmatic approach in the analysis. We analyze a five-year crime data from the Cincinnati Police Department using clustering algorithms such K-means, DBSCAN, Hierarchical algorithms, and classification machine learning algorithms such as Random Forest, SVM, Logistic Regression, KNN, and Naive Bayes, on the same dataset. The clustering methods are used as a standalone means of identifying crime hotspots rather than used as a data preprocessing step as done in prior experiments. The results from both approaches are compared using their respective evaluation metrics. From the performances, we find classification performed better than clustering on our dataset. The best performing algorithm is the Random Forest when the number of trees is 30. We also find considerable crime concentration along the hotspot street segments that were identified in the dataset.

    Committee: M. Murat Ozer Ph.D. (Committee Chair); Nelly Elsayed Ph.D. (Committee Member) Subjects: Information Technology
  • 4. Kondapalli, Swetha An Approach To Cluster And Benchmark Regional Emergency Medical Service Agencies

    Master of Science in Industrial and Human Factors Engineering (MSIHE) , Wright State University, 2020, Industrial and Human Factors Engineering

    Emergency Medical Service (EMS) providers are the first responders for an injured patient on the field. Their assessment of patient injuries and determination of an appropriate hospital play a critical role in patient outcomes. A majority of states in the US have established a state-level governing body (e.g., EMS Division) that is responsible for developing and maintaining a robust EMS system throughout the state. Such divisions develop standards, accredit EMS agencies, oversee the trauma system, and support new initiatives through grants and training. But to do so, these divisions require data to enable them to first understand the similarities between existing EMS agencies in the state in terms of their resources and activities. Benchmarking them against similar peer groups could then reveal best practices among top performers in terms of patient outcomes. While limited qualitative data exists in the literature based on surveys of EMS personnel related to their working environment, training, and stress, what is lacking is a quantitative approach that can help compare and contrast EMS agencies across a comprehensive set of factors and enable benchmarking. Our study fills this gap by proposing a data-driven approach to cluster EMS agencies (by county level) and subsequently benchmark them against their peers using two patient safety performance measures, under-triage (UT) and over-triage (OT). The study was conducted in three phases: data collection, clustering, and benchmarking. We first obtained data related to the trauma-specific capabilities, volume, and Performance Improvement activities. This data was collected by our collaborating team of health services researchers through a survey of over 300 EMS agencies in the state of OH. To estimate UT and OT, we used 6,002 de-identified patient records from 2012 made available by the state of Ohio's EMS Division. All the data was aggregated at county level. We then used several clustering methods to group counties us (open full item for complete abstract)

    Committee: Pratik J. Parikh Ph.D. (Advisor); Subhashini Ganapathy Ph.D. (Committee Member); Corrine Mowrey Ph.D. (Committee Member) Subjects: Computer Science; Industrial Engineering; Statistics
  • 5. Campbell, Benjamin Supervised and Unsupervised Machine Learning Strategies for Modeling Military Alliances

    Doctor of Philosophy, The Ohio State University, 2019, Political Science

    When modeling interstate military alliances, scholars make simplifying assumptions. However, most recognize these often invoked assumptions are overly simplistic. This dissertation leverages developments in supervised and unsupervised machine learning to assess the validity of these assumptions and examine how they influence our understanding of alliance politics. I uncover a series of findings that help us better understand the causes and consequences of alliances. The first assumption examined holds that states, when confronted by a common external security threat, form alliances to aggregate their military capabilities in an effort to increase their security and ensure their survival. Many within diplomatic history and security studies criticize this widely accepted "Capability Aggregation Model", noting that countries have various motives for forming alliances. In the first of three articles, I introduce an unsupervised machine learning algorithm designed to detect variation in how actors form relationships in longitudinal networks. This allows me to, in the second article, assess the heterogeneous motives countries have for forming alliances. I find that states form alliances to achieve foreign policy objectives beyond capability aggregation, including the consolidation of non-security ties and the pursuit of domestic reform. The second assumption is invoked when scholars model the relationship between alliances and conflict, routinely assuming that the formation of an alliance is exogeneous to the probability that one of the allies is attacked. This stands in stark contrast to the Capability Aggregation Model's expectations, which indicate that an external threat and an ally's expectation of attack by an aggressor influences the decision to form an alliance. In the final article, I examine this assumption and the causal relationship between alliances and conflict. Specifically, I endogenize alliances on the causal path to conflict using supe (open full item for complete abstract)

    Committee: Skyler Cranmer (Committee Chair); Box-Steffensmeier Janet (Committee Member); Braumoeller Bear (Committee Member); Gelpi Christopher (Committee Member) Subjects: Artificial Intelligence; Behavioral Sciences; Computer Science; International Relations; Military History; Peace Studies; Political Science; Statistics; World History
  • 6. Zhang, Pin Nonlinear Semi-supervised and Unsupervised Metric Learning with Applications in Neuroimaging

    Doctor of Philosophy (PhD), Ohio University, 2018, Electrical Engineering & Computer Science (Engineering and Technology)

    In many machine learning and data mining algorithms, pairwise distances (or dissimilarities) among data samples are computed based on the Euclidean metric, where all feature components are treated equally and assigned with the same weight. Learning a customized metric from the input data can often significantly improve the performance of the algorithms. In this dissertation, we propose two nonlinear distance metric learning (DML) frameworks to boost the performance of semi-supervised learning (SSL) and unsupervised learning (USL) algorithms, respectively. Formulated under a constrained optimization framework, our proposed SSL-DML method learns a smooth nonlinear feature space transformation that makes the input data samples more linearly separable in Laplacian SVM (LapSVM). Our USL-ML solution, on the other hand, aims to increase data's linear separability for k-means. A geometric model called Coherent Point Drifting (CPD) is utilized in both frameworks to move data points towards more desirable locations. The choice of CPD is with two considerations: 1) its remarkable capability in generating high-order yet smooth deformations; and 2) the available mechanism within CPD for assigning different levels of smoothness to data points. Application-wise, we apply our SSL-DML to predict the conversion of Alzheimer's Disease (AD) from its early stage: Mild Cognitive Impairment (MCI). The proposed USL-DML solution is utilized to improve the patient clustering. Using neuroimage data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, we evaluate the effectiveness of the proposed frameworks. The experimental results demonstrate the improvements over the state-of-the-art solutions within the same category.

    Committee: Jundong Liu (Advisor) Subjects: Computer Science; Electrical Engineering
  • 7. Doan, Charles Connecting Unsupervised and Supervised Categorization Behavior from a Parainformative Perspective

    Doctor of Philosophy (PhD), Ohio University, 2018, Experimental Psychology (Arts and Sciences)

    An intriguing and unsolved problem in cognitive science concerns the nature of and the relationship between unsupervised and supervised categorization behavior. The former refers to assessing how observers naturally sort multidimensional objects into groups and investigating whether they can learn more complicated groupings without external feedback from the environment. Conversely, the latter refers to experimental investigations aiming to predict and explain how observers inductively learn a predetermined grouping of stimuli upon receiving “correct” or “incorrect” feedback after each classification response. Although these approaches are very different, a few attempts have been put forth with the goal of connecting behavioral outcomes between the two tasks. In general, these investigations implement both types of tasks and seek to explain the results under a common theoretical or formal framework. Although the results are promising, there is a lack of consensus regarding which theoretical or formal approach best accounts for the data. Following this tradition of integration, we present a novel attempt at connecting unsupervised and supervised categorization behavior. We employ generalized invariance structure theory (GIST; Vigo 2013, 2014), generalized representational information theory (GRIT; Vigo 2011, 2012, 2014), and their associated formal models to predict and explain results from two separate experiments. For the first set of experiments, we assessed unsupervised categorization and associated learning behavior by employing a “construction” task previously implemented by the authors (Doan & Vigo, 2016). Importantly, we modified the procedure in accord with similar techniques as those found in prior investigations to facilitate establishing the connection between unsupervised and supervised learning behavior. We replicated Doan and Vigo (2016) and also observed a decrease in response times for each of the three sub experiments, suggesting participants (open full item for complete abstract)

    Committee: Ronaldo Vigo PhD (Committee Chair); Keith Markman PhD (Committee Member); Kimberly Rios PhD (Committee Member); Robert Briscoe PhD (Committee Member); Chao-Yang Lee PhD (Committee Member) Subjects: Behavioral Sciences; Cognitive Psychology; Experimental Psychology; Information Science
  • 8. Eldridge, Justin Clustering Consistently

    Doctor of Philosophy, The Ohio State University, 2017, Computer Science and Engineering

    Clustering is the task of organizing data into natural groups, or clusters. A central goal in developing a theory of clustering is the derivation of correctness guarantees which ensure that clustering methods produce the right results. In this dissertation, we analyze the setting in which the data are sampled from some underlying probability distribution. In this case, an algorithm is "correct" (or consistent) if, given larger and larger data sets, its output converges in some sense to the ideal cluster structure of the distribution. In the first part, we study the setting in which data are drawn from a probability density supported on a subset of a Euclidean space. The natural cluster structure of the density is captured by the so-called high density cluster tree, which is due to Hartigan (1981). Hartigan introduced a notion of convergence to the density cluster tree, and recent work by Chaudhuri and Dasgupta (2010) and Kpotufe and von Luxburg (2011) has contructed algorithms which are consistent in this sense. We will show that Hartigan's notion of consistency is in fact not strong enough to ensure that an algorithm recovers the density cluster tree as we would intuitively expect. We identify the precise deficiency which allows this, and introduce a new, stronger notion of convergence which we call consistency in merge distortion. Consistency in merge distortion implies Hartigan's consistency, and we prove that the algorithm of Chaudhuri and Dasgupta (2010) satisfies our new notion. In the sequel, we consider the clustering of graphs sampled from a very general, non-parametric random graph model called a graphon. Unlike in the density setting, clustering in the graphon model is not well-studied. We therefore rigorously analyze the cluster structure of a graphon and formally define the graphon cluster tree. We adapt our notion of consistency in merge distortion to the graphon setting and identify efficient, consistent algorithms.

    Committee: Mikhail Belkin PhD (Advisor); Yusu Wang PhD (Advisor); Facundo Mémoli PhD (Committee Member); Vincent Vu PhD (Committee Member) Subjects: Artificial Intelligence; Computer Science; Statistics
  • 9. Awodokun, Olugbenga Classification of Patterns in Streaming Data Using Clustering Signatures

    MS, University of Cincinnati, 2017, Engineering and Applied Science: Electrical Engineering

    Streaming datasets often pose a myriad of challenges for machine learning algorithms, some of which include insufficient storage and changes in the underlying distributions of the data during different time intervals. This thesis proposes a hierarchical clustering based method (unsupervised learning) for determining signatures of data in a time window and thus building a classifier based on the match between the observed clusters and known patterns of clustering. When new clusters are observed, they are added to the collection of possible global list of clusters, used to generate a signature for data in a time window. Dendrograms are created from each time window, and their clusters were compared to a global list of clusters. The global clusters list is only updated if none of the existing global clusters that can model data points in any later time window. The global clusters were then used in the testing phase to classify novel data chunks according to their Tanimoto similarities. Although the training samples were only taken from 20% of the entire KDD Cup 99 dataset, we validated our approach by using test data from different regions of the datasets at multiple intervals and the classifier performance achieved was comparable to other methods that had used the entire datasets for training.

    Committee: Raj Bhatnagar Ph.D. (Committee Chair); Gowtham Atluri (Committee Member); Nan Niu Ph.D. (Committee Member) Subjects: Computer Science
  • 10. Halsey, Phillip The Nature of Modality and Learning Task: Unsupervised Learning of Auditory Categories

    Doctor of Philosophy (PhD), Ohio University, 2015, Experimental Psychology (Arts and Sciences)

    Categorization and concept-learning has a long-standing influence on the field of psychology because the notions of concept-learning are key to how individuals learn. Central to this idea is; how do we categorize stimuli that vary according to different dimensions? How do we categorize stimuli under different conditions? How do we store these categorizes as mental representations? And does the modality of the stimuli affect our construction of a mental concept, and to what extent does this affect categorization behavior? To partially answer this last question, it has been determined that the modality of a stimulus does influence categorization behavior but the extent of this is unknown. The current dissertation explores the manner in which stimulus modality, relationships between stimulus dimensions, and learning method affects categorization behavior. Two experiments are conducted in order to examine the auditory dimensions individuals attend to when making comparisons, and how individuals spontaneously categorize auditory stimuli based on the attended dimensions. Participant's data was then examined according to three models of unsupervised learning: the simplicity model, SUSTAIN, and GISTM.

    Committee: Ronaldo Vigo (Advisor); Steve Evans (Committee Member); Keith Markman (Committee Member); Robert Briscoe (Committee Member); Mark Phillips (Committee Member) Subjects: Cognitive Psychology; Psychology
  • 11. Mirzaei, Golrokh Data Fusion of Infrared, Radar, and Acoustics Based Monitoring System

    Doctor of Philosophy, University of Toledo, 2014, Engineering

    Many birds and bats fatalities have been reported in the vicinity of wind farms. An acoustic, infrared camera, and marine radar based system is developed to monitor the nocturnal migration of birds and bats. The system is deployed and tested in an area of potential wind farm development. The area is also a stopover for migrating birds and bats. Multi-sensory data fusion is developed based on acoustics, infrared camera (IR), and radar. The diversity of the sensors technologies complicated its development. Different signal processing techniques were developed for processing of various types of data. Data fusion is then implemented from three diverse sensors in order to make inferences about the targets. This approach leads to reduction of uncertainties and provides a desired level of confidence and detail information about the patterns. This work is a unique, multifidelity, and multidisciplinary approach based on pattern recognition, machine learning, signal processing, bio-inspired computing, probabilistic methods, and fuzzy reasoning. Sensors were located in the western basin of Lake Erie in Ohio and were used to collect data over the migration period of 2011 and 2012. Acoustic data were collected using acoustic detectors (SM2 and SM2BAT). Data were preprocessed to convert the recorded files to standard wave format. Acoustic processing was performed in two steps: feature extraction, and classification. Acoustic features of bat echolocation calls were extracted based on three different techniques: Short Time Fourier Transform (STFT), Mel Frequency Cepstrum Coefficient (MFCC), and Discrete Wavelet Transform (DWT). These features were fed into an Evolutionary Neural Network (ENN) for their classification at the species level using acoustic features. Results from different feature extraction techniques were compared based on classification accuracy. The technique can identify bats and will contribute towards developing mitigation procedures for reducing bat fata (open full item for complete abstract)

    Committee: Mohsin Jamali Dr. (Committee Chair); Jackson Carvalho Dr. (Committee Member); Mohammed Niamat Dr. (Committee Member); Richard Molyet Dr. (Committee Member); Mehdi Pourazady Dr. (Committee Member) Subjects: Biology; Computer Engineering; Computer Science; Ecology; Electrical Engineering; Energy; Engineering
  • 12. CAO, BAOQIANG ON APPLICATIONS OF STATISTICAL LEARNING TO BIOPHYSICS

    PhD, University of Cincinnati, 2007, Arts and Sciences : Physics

    In this dissertation, we develop statistical and machine learning methods for problems in biological systems and processes. In particular, we are interested in two problems–predicting structural properties for membrane proteins and clustering genes based on microarray experiments. In the membrane protein problem, we introduce a compact representation for amino acids, and build a neural network predictor based on it to identify transmembrane domains for membrane proteins. Membrane proteins are divided into two classes based on the secondary structure of the parts spanning the bilayer lipids: alpha-helical and beta-barrel membrane proteins. We further build a support regression model to predict the lipid exposed levels for the amino acids within the transmembrane domains in alpha-helical membrane proteins. We also develop methods to predict pore-forming residues for beta-barrel membrane proteins. In the other problem, we apply a context-specific Bayesian clustering model to cluster genes based on their expression levels and cDNA copy numbers. This dissertation is organized as follows. Chapter 1 introduces the most relevant biology and statistical and machine learning methods. Chapters 2 and 3 focus on prediction of transmembrane domains for the alpha-helix and the beta-barrel, respectively. Chapter 4 discusses the prediction of relative lipid accessibility, a different structural property for membrane proteins. The final chapter addresses the gene clustering approach.

    Committee: Dr. Mark Jarrell (Advisor) Subjects: Physics, Molecular
  • 13. Warren, Emily Machine Learning for Road Following by Autonomous Mobile Robots

    Master of Sciences (Engineering), Case Western Reserve University, 2008, EECS - Computer Engineering

    This thesis explores the use of machine learning in the context of autonomous mobile robots driving on roads, with the focus on improving the robot's internal map. Early chapters cover the mapping efforts of DEXTER, Team Case's entry in the 2007 DARPA Urban Challenge. Competent driving may include the use of a priori information, such as road maps, and online sensory information, including vehicle position and orientation estimates in absolute coordinates as well as error coordinates relative to a sensed road. An algorithm may select the best of these typically flawed sources, or more robustly, use all flawed sources to improve an uncertain world map, both globally in terms of registration corrections and locally in terms of improving knowledge of obscured roads. It is shown how unsupervised learning can be used to train recognition of sensor credibility in a manner applicable to optimal data fusion.

    Committee: Wyatt Newman PhD (Advisor); M. Cenk Cavusoglu PhD (Committee Member); Francis Merat PhD (Committee Member) Subjects: Computer Science; Engineering; Robots
  • 14. Mariam-Smith, Arshiya Identification and Prediction of Clinical Analogue Cohorts in Electronic Health Records

    Doctor of Philosophy, Case Western Reserve University, 2024, Biomedical and Health Informatics

    Leveraging longitudinal clinical data can provide additional information for improvements in precision medicine. While subtypes of various diseases e.g., Alzheimer's disease, have distinct longitudinal clinical manifestations, this information is seldom used, presenting a missed opportunity for disease characterization. Here, we broadly investigate relevance of longitudinal phenotypes in clinical settings by: i) showing robust identification and prediction of new longitudinal phenotypes in a clinical trial setting. We used a temporal matching algorithm (i.e., dynamic time warping) in clustering HbA1c measurements and identified four subtypes of glycemia response in the Action to Control Cardiovascular Risk in Diabetes (ACCORD) clinical trial, which investigated diabetes management in patients with high cardiovascular risk (CVD). One subtype, C4, had treatment-mediated reduced CVD risk, and could be predicted with high accuracy (AUC=0.98). ii) Identification of robust temporal matching algorithms in real-world, simulated, longitudinal clinical data. Longitudinal clinical data is distinct from audio and video signal data used to develop these algorithms. We identified robust algorithms through systematic evaluation in simulated data and applied it to identify five distinct body mass index patterns with modified risk of metabolic syndrome in a large pediatric cohort (N>43,000). iii) Development of clinical analogue cohorts (CACs) to set the stage for multi-disease prediction. Approximately 33% adults and 75% older adults (> 65 years old) in developed countries are impacted by multiple chronic conditions (CCs), which are linked to increased medication use, specialist care and emergency services. Using diagnoses trajectories retrieved from the UK Biobank, we identified eight stable CACs across the life cycle in both females and males. These CACs had distinct CC risk profiles and genetic predispositions. For example, CAC-10 (males) had increased risk of prostate (open full item for complete abstract)

    Committee: David Kaelber (Committee Member); Xiaofeng Zhu (Committee Member); Jessica Cooke Bailey (Committee Member); William Bush (Committee Chair); Daniel Rotroff (Advisor) Subjects: Biomedical Research
  • 15. Perera, Gamage Upeksha Covariance Structure Analysis for Deep Gaussian Mixture models

    Doctor of Philosophy (Ph.D.), Bowling Green State University, 2023, Data Science

    Deep Gaussian Mixture Models (DGMMs) are probabilistic models that combine multiple layers of latent variables (Viroli and McLachlan, 2019). DGMMs adeptly capture intricate, non-linear interactions among variables, facilitating efficient unsupervised learning. This study strives to improve DGMMs by introducing a method to systematically choose optimal covariance structures for each DGMM layer. Our proposal involves a closed multiple-testing procedure that utilizes the likelihood-ratio test to select the most suitable covariance structure from a set of candidate structures. Our proposal is inspired by the likelihood-ratio method proposed by Greselin and Punzo (2013). Typically, information criteria are widely used in the context of model selection, but they have different properties and may be more appropriate for different types of data or modeling situations. Additionally, the selection of a specifc information criterion is subjective and many practitioners tend to use a particular method routinely, which can limit the potential for discovering the best covariance structure for the data at hand (Greselin and Punzo (2013), Punzo et al. (2016)). The proposed method draws inspiration from McNicholas and Murphy (2008), in the context of mixture factor analyzers, where constraints are applied to covariance structures. In Deep Gaussian Mixture Models (DGMMs), these covariance structures can be defned at each layer, creating a range of complexities. To aid covariance structure selection in DGMMs, it is assumed that each cluster within a layer shares the same covariance structure. The chosen structure achieves a balance between model complexity, enhancing performance and predictive accuracy. The method employs a closed multiple-testing approach based on the likelihood ratio test, comparing likelihoods of different covariance structures for the DGMM. We conduct a series of simulations considering multiple heteroscedasticity configurations that represent different cova (open full item for complete abstract)

    Committee: Junfeng Shang Ph.D. (Committee Chair); Lauren Maziarz Ph.D. (Other); Hanfeng Chen Ph.D. (Committee Member); Rob Green Ph.D. (Committee Member) Subjects: Statistics
  • 16. Dai, Honghao Unsupervised Learning Using Change Point Features Of Time-Series Data For Improved PHM

    PhD, University of Cincinnati, 2023, Engineering and Applied Science: Mechanical Engineering

    Prognostics and health management (PHM), which aims to convert preventive maintenance (periodical maintenance) into predictive maintenance (condition-based maintenance), has gained increasing attention in the current era of the Internet of Things (IoT), Industry 4.0, and Industrial AI. A significant amount of research has been conducted using a variety of signal processing, statistical analysis, and machine learning algorithms to develop different PHM systems. Feature learning is a crucial task in bridging the gap between data and models. Time-series data in sensor environments exhibit continuous changes and drifts, which require PHM models to balance static and time-independent uncertainty for feature learning. In this dissertation, a novel deep autoencoder with time-lagged regularization is proposed. This method can learn features from the time-domain and frequency-domain and detect underlying weak-sense stationarity. A change point detection strategy is developed by combining the time-lagged autoencoder with a dissimilarity-based anomaly detector. The effectiveness of the proposed change point detection algorithm is validated using public benchmarking datasets, fault detection and prognostics of ion milling etching machine data, non-artificial segments recognition, and long-term assessment of intracranial pressure signals. The proposed methodology is compared with state-of-the-art benchmark approaches and found to establish an improved PHM model with sustainable performance in discovering change point features in time-series signals.

    Committee: Jay Lee Ph.D. (Committee Chair); Brandon Foreman M.D. (Committee Member); Jing Shi Ph.D. (Committee Member); Jay Kim Ph.D. (Committee Member); Xiaodong Jia Ph.D. (Committee Member) Subjects: Mechanical Engineering
  • 17. Mathur, Nitin Application of Autoencoder Ensembles in Anomaly and Intrusion Detection using Time-Based Analysis

    MS, University of Cincinnati, 2020, Education, Criminal Justice, and Human Services: Information Technology

    Signature-based intrusion detection methods report high accuracy with low false alarm rates. However, they do not perform well when faced with new or emerging threats. This work focuses on anomaly-based data-driven methods to identify potential zero-day-attacks using a specific class of neural networks known as the autoencoder. The significance of this study is that explicit labels are not used in the training process, and rather than categorizing each individual flow or packet, the time dimension which has often been ignored in the literature is leveraged to identify traffic that does not conform to the normal or expected behavior.

    Committee: Chengcheng Li Ph.D. (Committee Chair); Bilal Gonen Ph.D. (Committee Member); Kijung Lee Ph.D. (Committee Member) Subjects: Information Technology
  • 18. Erdmann, Alexander Practical Morphological Modeling: Insights from Dialectal Arabic

    Doctor of Philosophy, The Ohio State University, 2020, Linguistics

    This thesis treats a major challenge for current state-of-the-art natural language processing (NLP) pipelines: morphologically rich languages where many inflected forms or weak form--meaning correspondence lead to data sparsity and noise. For example, if the lexeme TEACHER occurs the same number of times in an English text and an Arabic text, those occurrences will be spread over just four forms in English, teacher, teacher's, teachers' and teachers, versus numerous forms in Arabic, leading to more low frequency and out-of-vocabulary forms at test time. Furthermore, while the +s suffix of teachers is highly predictable, there is significant entropy involved in predicting how pluralization will realize in Arabic, which can cause models to be noisy. That said, the particular means of realizing pluralization (among other properties) can be informative in Arabic, as the +wn in mdrswn, 'teachers' not only indicates plurality, but also that the referent is human. To address data sparsity and noise from morphological richness, I propose some practical means of inducing morphological information and/or incorporating morphological information in preprocessing steps or model components, depending on the task at hand. The goals of this intervention are twofold. First, I aim to link variant inflections of the same lexeme to reduce sparsity. Second, I aim to mitigate noise by identifying morphosyntactic properties encoded in complex inflections like mdrswn and leverage them to help models interpret low frequency or out-of-vocabulary forms. To be practical, morphological modeling should be maximally language agnostic, i.e., portable to new languages or domains with minimal human effort, and maximally cheap, i.e., in terms of the amount/cost of required manual supervision. Thus, I explore morphological modeling strategies and morphological resource creation, progressing toward more language agnostic solutions requiring less supervision over the course of this thesis. To (open full item for complete abstract)

    Committee: Marie-Catherine de Marneffe (Advisor); Micha Elsner (Committee Member); Nizar Habash (Committee Member); Andrea Sims (Committee Member) Subjects: Computer Science; Linguistics
  • 19. Davis, Casey Using Self-Organizing Maps to Cluster Products for Storage Assignment in a Distribution Center

    Master of Science (MS), Ohio University, 2017, Industrial and Systems Engineering (Engineering and Technology)

    This thesis provides a methodology on how to use self-organizing maps (SOMs) to cluster stock keeping units (SKUs) based on historical order data, in order to effectively slot a forward area in a distribution center. This methodology relies on creating zones that contain SKUs that are commonly ordered together. There are several techniques that improve on the benchmark method tested including a percent reduction of up to 11% in the total time to complete all orders given a zone configuration. Results are discussed as well as possible future work that could improve upon the methodology.

    Committee: Dale Masel Ph.D (Advisor); Gary Weckman Ph.D (Committee Member); Dianna Schwerha Ph.D (Committee Member); William Young Ph.D (Committee Member) Subjects: Industrial Engineering