Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
38012.pdf (12.67 MB)
ETD Abstract Container
Abstract Header
Dimensionality Reduction in Healthcare Data Analysis on Cloud Platform
Author Info
Ray, Sujan
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ucin161375080072697
Abstract Details
Year and Degree
2020, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
Abstract
Nowadays, it is becoming very easy to have a huge collection of healthcare data, especially because of relatively cheap wearable devices. Subsequently, we can mine clinical data and acquire meaningful information. It helps in making better decisions and improve the healthcare sector by minimizing the costs. Healthcare datasets that are available in public domain have lots of features and it is manually impossible to identify the factors that contribute to the disease [1]. Therefore, it is necessary to use Machine Learning (ML) algorithms to identify the most important features that will help in finding out the occurrence of diseases from huge number of features. Thus, we could predict the disease more accurately with the model trained by only the top features of the dataset. Considering the fact that the healthcare data is coming from different sources with different sizes, there is a need for cloud-based platform. The first aim of this dissertation is to focus on the important field where big data is used for health care to diagnose diseases before they occur or to avoid them. Breast Cancer (BC) is the second most common cancer in women after skin cancer and has become a major health issue. As a result, it is very important to diagnose BC correctly and categorizing the tumors into malignant or benign groups. We know that ML techniques that have unique advantages and are widely used to analyze complex BC dataset and predict the disease. Wisconsin Diagnostic Breast Cancer (WDBC) dataset has been used to develop predictive models for BC by researchers in this field. In this dissertation, we propose a method for analyzing and predicting BC on the same dataset using Apache Spark. The experiments are executed on Hadoop cluster, a cloud platform provided by the Electrical Engineering and Computer Science (EECS) department at the University of Cincinnati. Our results show that selecting the right features significantly improves the accuracy in predicting BC. The second aim of this research is to focus on the Stroke dataset. In the United States, stroke is the fifth prominent cause of fatality and it is a major reason of serious disability among the adult population [9]. Therefore, it is crucial that we can predict stroke accurately to be treated in early stages. Here, we propose a method for the analysis and prediction of stroke on the same dataset using Microsoft AzureML which is a cloud-based platform [10]. Then, another hypothesis is to focus on the healthcare benefits associated with regular physical activity monitoring and recognition. There is a solid evidence that regular monitoring and recognition of physical activity can potentially help to manage and reduce the risk of different types of diseases. In this dissertation, we propose a hybrid approach to analyze and recognize human activity on the same dataset using deep learning method on cloud-based platform. Our experimental results show that if we could select the features properly then not only the accuracy could be improved but also the training and testing time of the model.
Committee
Marc Cahay, Ph.D. (Committee Chair)
Dharma Agrawal, D.Sc. (Committee Member)
Rui Dai, Ph.D. (Committee Member)
Wen-Ben Jone, Ph.D. (Committee Member)
Manish Kumar, Ph.D. (Committee Member)
Carla Purdy, Ph.D. (Committee Member)
Pages
148 p.
Subject Headings
Computer Science
Keywords
Dimensionality Reduction
;
Cloud Platform
;
Azure Machine Learning Studio
;
Apache Spark
;
Activity Recognition
;
Healthcare Data
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Ray, S. (2020).
Dimensionality Reduction in Healthcare Data Analysis on Cloud Platform
[Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin161375080072697
APA Style (7th edition)
Ray, Sujan.
Dimensionality Reduction in Healthcare Data Analysis on Cloud Platform.
2020. University of Cincinnati, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ucin161375080072697.
MLA Style (8th edition)
Ray, Sujan. "Dimensionality Reduction in Healthcare Data Analysis on Cloud Platform." Doctoral dissertation, University of Cincinnati, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin161375080072697
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ucin161375080072697
Download Count:
251
Copyright Info
© 2020, all rights reserved.
This open access ETD is published by University of Cincinnati and OhioLINK.