Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 1)

Mini-Tools

 
 

Search Report

  • 1. Ahmed, Jishan Cost-Aware Machine Learning and Deep Learning for Extremely Imbalanced Data

    Doctor of Philosophy (Ph.D.), Bowling Green State University, 2023, Data Science

    Many real-world datasets, such as those used for failure and anomaly detection, are severely imbalanced, with a relatively small number of failed instances compared to the number of normal instances. This imbalance often results in bias towards the majority class during learning, making mitigation a serious challenge. To address these issues, this dissertation leverages the Backblaze HDD data and makes several contributions to hard drive failure prediction. It begins with an evaluation of the current state of the art techniques, and the identification of any existing shortcomings. Multiple facets of machine learning (ML) and deep learning (DL) approaches to address these challenges are explored. The synthetic minority over-sampling technique (SMOTE) is investigated by evaluating its performance with different distance metrics and nearest neighbor search algorithms, and a novel approach that integrates SMOTE with Gaussian mixture models (GMM), called GMM SMOTE, is proposed to address various issues. Subsequently, a comprehensive analysis of different cost-aware ML techniques applied to disk failure prediction is provided, emphasizing the challenges in current implementations. The research also expands to create explore a variety of cost-aware DL models, from 1D convolutional neural networks (CNN) and long short-term memory (LSTM) models to a hybrid model combining 1D CNN and bidirectional LSTM (BLSTM) approaches to utilize the sequential nature of hard drive sensor data. A modified focal loss function is introduced to address the class imbalance issue prevalent in the hard drive dataset. The performance of DL models is compared to traditional ML algorithms, such as random forest (RF) and logistic regression (LR), demonstrating superior results, suggesting the potential effectiveness of the proposed focal loss function. In addition to these efforts, this dissertation aims to provide a comprehensive understanding of hard drive longevity and the critical factors contrib (open full item for complete abstract)

    Committee: Robert C. Green II Ph.D. (Committee Chair); Liuling Liu Ph.D. (Other); Umar D Islambekov Ph.D. (Committee Member); Junfeng Shang Ph.D. (Committee Member) Subjects: Computer Science; Statistics