Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Dimensionality Reduction in Healthcare Data Analysis on Cloud Platform

Abstract Details

2020, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
Nowadays, it is becoming very easy to have a huge collection of healthcare data, especially because of relatively cheap wearable devices. Subsequently, we can mine clinical data and acquire meaningful information. It helps in making better decisions and improve the healthcare sector by minimizing the costs. Healthcare datasets that are available in public domain have lots of features and it is manually impossible to identify the factors that contribute to the disease [1]. Therefore, it is necessary to use Machine Learning (ML) algorithms to identify the most important features that will help in finding out the occurrence of diseases from huge number of features. Thus, we could predict the disease more accurately with the model trained by only the top features of the dataset. Considering the fact that the healthcare data is coming from different sources with different sizes, there is a need for cloud-based platform. The first aim of this dissertation is to focus on the important field where big data is used for health care to diagnose diseases before they occur or to avoid them. Breast Cancer (BC) is the second most common cancer in women after skin cancer and has become a major health issue. As a result, it is very important to diagnose BC correctly and categorizing the tumors into malignant or benign groups. We know that ML techniques that have unique advantages and are widely used to analyze complex BC dataset and predict the disease. Wisconsin Diagnostic Breast Cancer (WDBC) dataset has been used to develop predictive models for BC by researchers in this field. In this dissertation, we propose a method for analyzing and predicting BC on the same dataset using Apache Spark. The experiments are executed on Hadoop cluster, a cloud platform provided by the Electrical Engineering and Computer Science (EECS) department at the University of Cincinnati. Our results show that selecting the right features significantly improves the accuracy in predicting BC. The second aim of this research is to focus on the Stroke dataset. In the United States, stroke is the fifth prominent cause of fatality and it is a major reason of serious disability among the adult population [9]. Therefore, it is crucial that we can predict stroke accurately to be treated in early stages. Here, we propose a method for the analysis and prediction of stroke on the same dataset using Microsoft AzureML which is a cloud-based platform [10]. Then, another hypothesis is to focus on the healthcare benefits associated with regular physical activity monitoring and recognition. There is a solid evidence that regular monitoring and recognition of physical activity can potentially help to manage and reduce the risk of different types of diseases. In this dissertation, we propose a hybrid approach to analyze and recognize human activity on the same dataset using deep learning method on cloud-based platform. Our experimental results show that if we could select the features properly then not only the accuracy could be improved but also the training and testing time of the model.
Marc Cahay, Ph.D. (Committee Chair)
Dharma Agrawal, D.Sc. (Committee Member)
Rui Dai, Ph.D. (Committee Member)
Wen-Ben Jone, Ph.D. (Committee Member)
Manish Kumar, Ph.D. (Committee Member)
Carla Purdy, Ph.D. (Committee Member)
148 p.

Recommended Citations

Citations

  • Ray, S. (2020). Dimensionality Reduction in Healthcare Data Analysis on Cloud Platform [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin161375080072697

    APA Style (7th edition)

  • Ray, Sujan. Dimensionality Reduction in Healthcare Data Analysis on Cloud Platform. 2020. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin161375080072697.

    MLA Style (8th edition)

  • Ray, Sujan. "Dimensionality Reduction in Healthcare Data Analysis on Cloud Platform." Doctoral dissertation, University of Cincinnati, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin161375080072697

    Chicago Manual of Style (17th edition)