Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 225)

Mini-Tools

 
 

Search Report

  • 1. Akula, Venkata Ganesh Ashish Implementation of Advanced Analytics on Customer Satisfaction Process in Comparison to Traditional Data Analytics

    Master of Science in Engineering, University of Akron, 2019, Mechanical Engineering

    One of the major challenges in the survey data analysis is to determine which methodology or technique suits the best for the data. The constant rise in the data being obtained over the year's calls for the need for effective data analysis techniques, as an ineffective data analysis could lead to false recommendations and less customer satisfaction. Therefore, the main focus of this research is to test a variety of advanced data analysis methods and determine how to improve the insights obtained through survey data for sustainable continuous improvement. The data used in this research is obtained from the AJI-2 technical training department of the Federal Aviation Administration in the form of the end of the course and post-course evaluations. Contrary to the traditional survey analytical methods such as the summary statistics, we systematically tested and compared the utilization of advanced analytics on the survey data. Average Weighted Score which is widely used in survey data analysis is able to differentiate the degree of surveyees' satisfaction level on the survey questions and consequently is able to provide more insightful information on the course evaluations and customer satisfaction. Advanced analytics such as Correlation Analysis is used to understand the correlation in the data among the responses to the overall satisfaction question; Contingency Analysis is conducted to analyze the responses the surveyees chose when compared to their overall satisfaction; Logistic Regression is used on the survey data, to model the association of a categorical outcome of overall satisfaction with independent variables, and the Cluster Analysis is conducted to analyze the survey data to form clusters based on the responses that share common characteristics with which each cluster will have a unique continuous improvement strategy to improve customer satisfaction. These insightful findings obtained from this advanced analytics were helpful in understanding the data patte (open full item for complete abstract)

    Committee: Shengyong Wang PhD (Advisor); Chen Ling PhD (Committee Member) Subjects: Business Administration; Mechanical Engineering
  • 2. Dutta, Soumya In Situ Summarization and Visual Exploration of Large-scale Simulation Data Sets

    Doctor of Philosophy, The Ohio State University, 2018, Computer Science and Engineering

    Recent advancements in the computing power have enabled the application scientists to design their simulation study using very high-resolution computational models. The output data from such simulations provide a plethora of information that need to be explored for enhanced understanding of the underlying phenomena. Large-scale simulations, nowadays, produce multivariate, time-varying data sets in the order of petabytes and beyond. Traditional post-processing based analysis utilizing raw data cannot be readily applicable, since storing all the data is becoming prohibitively expensive. This is because of the bottleneck stemming from output data size and I/O compared to the ever-increasing computing speed. Hence, exploration and visualization of such extreme-scale simulation outputs are posing significant challenges. This dissertation addresses the aforementioned issues and suggests an alternative pathway by enabling in situ analysis, i.e., in-place analysis of data, while it still resides in supercomputer memory. We embrace the in situ technology and adopt simulation time data analysis, triage, and summarization using various data transformation techniques. The proposed methods process data as the simulation generates it and employ different analysis techniques to extract important data properties efficiently. However, the amount of work that can be done in situ is often limited in terms of time and storage since overburdening the simulation with additional computation is undesired. Furthermore, while some application domain driven analyses fit well for an in situ environment, a wide range of visual-analytics tasks require longer time involving iterative exploration during post-processing. Therefore, to this end, we conduct in situ statistical data summarization in the form of compact probability distribution functions, which preserve essential statistical data properties and facilitate flexible and scalable post-hoc exploration. We show that the reduced stati (open full item for complete abstract)

    Committee: Han-Wei Shen (Advisor) Subjects: Computer Engineering; Computer Science
  • 3. Brown, Kyle Topological Hierarchies and Decomposition: From Clustering to Persistence

    Doctor of Philosophy (PhD), Wright State University, 2022, Computer Science and Engineering PhD

    Hierarchical clustering is a class of algorithms commonly used in exploratory data analysis (EDA) and supervised learning. However, they suffer from some drawbacks, including the difficulty of interpreting the resulting dendrogram, arbitrariness in the choice of cut to obtain a flat clustering, and the lack of an obvious way of comparing individual clusters. In this dissertation, we develop the notion of a topological hierarchy on recursively-defined subsets of a metric space. We look to the field of topological data analysis (TDA) for the mathematical background to associate topological structures such as simplicial complexes and maps of covers to clusters in a hierarchy. Our main results include the definition of a novel hierarchical algorithm for constructing a topological hierarchy, and an implementation of the MAPPER algorithm and our topological hierarchies in pure Python code as well as a web app dashboard for exploratory data analysis. We show that the algorithm scales well to high-dimensional data due to the use of dimensionality reduction in most TDA methods, and analyze the worst-case time complexity of MAPPER and our hierarchical decomposition algorithm. Finally, we give a use case for exploratory data analysis with our techniques.

    Committee: Derek Doran Ph.D. (Advisor); Michael Raymer Ph.D. (Committee Member); Vincent Schmidt Ph.D. (Committee Member); Nikolaos Bourbakis Ph.D. (Committee Member); Thomas Wischgoll Ph.D. (Committee Member) Subjects: Computer Science
  • 4. Bhowmik, Kowshik Comparing Communities & User Clusters in Twitter Network Data

    MS, University of Cincinnati, 2019, Engineering and Applied Science: Computer Science

    Community detection in Social Networks has been a major research interest in recent years. In graphical community detection, the principal consideration is the connection between users in the network data. On the other hand, document clustering is a paradigm where text documents are clustered together based on their textual properties. In this thesis, we have used document clustering techniques on data collected from the social networking site, Twitter to cluster the users associated with them. We then compared the user clusters formed by the document clustering techniques and compared them with the communities detected in the graphical representation to investigate the possibilities of any correlation between these two methods. We utilized tools such as NodeXL and Gephi for collecting and visualizing the network data respectively. For user clustering based on their tweets, we used four different feature representation techniques and two clustering algorithms.

    Committee: Anca Ralescu Ph.D. (Committee Chair); Kenneth Berman Ph.D. (Committee Member); Dan Ralescu Ph.D. (Committee Member) Subjects: Computer Science
  • 5. Sisco, Zachary Verifying Data-Oriented Gadgets in Binary Programs to Build Data-Only Exploits

    Master of Science (MS), Wright State University, 2018, Computer Science

    Data-Oriented Programming (DOP) is a data-only code-reuse exploit technique that “stitches” together sequences of instructions to alter a program's data flow to cause harm. DOP attacks are difficult to mitigate because they respect the legitimate control flow of a program and by-pass memory protection schemes such as Address Space Layout Randomization, Data Execution Prevention, and Control Flow Integrity. Techniques that describe how to build DOP payloads rely on a program's source code. This research explores the feasibility of constructing DOP exploits without source code—that is, using only binary representations of programs. The lack of semantic and type information introduces difficulties in identifying data-oriented gadgets and their properties. This research uses binary program analysis techniques and formal methods to identify and verify data-oriented gadgets, and determine if they are reachable and executable from a given memory corruption vulnerability. This information guides the construction of DOP attacks without the need for source code, showing that common-off-the-shelf programs are also vulnerable to this class of exploit.

    Committee: Adam Bryant Ph.D. (Committee Co-Chair); John Emmert Ph.D. (Committee Co-Chair); Meilin Liu Ph.D. (Committee Member); Krishnaprasad Thirunarayan Ph.D. (Committee Member) Subjects: Computer Science
  • 6. Su, Yu Big Data Management Framework based on Virtualization and Bitmap Data Summarization

    Doctor of Philosophy, The Ohio State University, 2015, Computer Science and Engineering

    In recent years, science has become increasingly data driven. Data collected from instruments and simulations is extremely valuable for a variety of scientific endeavors. The key challenge being faced by these efforts is that the dataset sizes continue to grow rapidly. With growing computational capabilities of parallel machines, temporal and spatial scales of simulations are becoming increasingly fine-grained. However, the data transfer bandwidths and disk IO speed are growing at a much slower pace, making it extremely hard for scientists to transport these rapidly growing datasets. Our overall goal is to provide a virtualization and bitmap based data management framework for “big data” applications. The challenges rise from four aspects. First, the “big data” problem leads to a strong requirement for efficient but light-weight server-side data subsetting and aggregation to decrease the data loading and transfer volume and help scientists find subsets of the data that is of interest to them. Second, data sampling, which focuses on selecting a small set of samples to represent the entire dataset, is able to greatly decrease the data processing volume and improve the efficiency. However, finding a sample with enough accuracy to preserve scientific data features is difficult, and estimating sampling accuracy is also time-consuming. Third, correlation analysis over multiple variables plays a very important role in scientific discovery. However, scanning through multiple variables for correlation calculation is extremely time-consuming. Finally, because of the huge gap between computing and storage, a big amount of time for data analysis is wasted on IO. In an in-situ environment, before the data is written to the disk, how to generate a smaller profile of the data to represent the original dataset and still support different analyses is very difficult. In our work, we proposed a data management framework to support more efficient scientific data analysis, which (open full item for complete abstract)

    Committee: Gagan Agrawal (Advisor) Subjects: Computer Science
  • 7. Stana, Alexandru An Examination of Relationships Between Exposure to Sexually Explicit Media Content and Risk Behaviors: A Case Study of College Students

    Doctor of Philosophy (Ph.D.), Bowling Green State University, 2013, Media and Communication

    In spite of its prevalence in the contemporary media landscape, the effects of exposure to sexually explicit materials have received relatively little attention from media and communication scholars. From a Social Cognitive Theory (SCT) perspective, the present study investigated whether the consumption of sexually explicit materials predicts the adoption of risk behaviors, particularly sex- and body image-related risk behaviors. In addition, the study focused on the psychological mechanisms - represented by the Sexual Self-Concept (SSC) - that could facilitate the adoption of said risk behaviors. In order to address these issues, quantitative data was collected using a self-administered online survey design. Also, in response to mounting criticism according to which quantitative research methods could offer only truncated snapshots of individuals' interactions with sexually explicit materials, a second, qualitative data set was collected using a self-administered diary design. The analysis of the quantitative data revealed that consumption of sexually explicit media content significantly predicts SSC scores. In turn, SSC was found to be a significant predictor of the adoption of sex-related risk behaviors (sex risk partners and sex risk practices). SSC was found to not be a significant predictor of body image health-related risk behaviors. A path model revealed that the SSC moderates the adoption of risk behaviors, thus supporting the theoretically-driven hypothesis that the SSC functions as a psychological mechanism that could facilitate the adoption of risk behaviors. Also, the path model revealed that age and gender significantly predict the adoption of risk behaviors. Thematic analysis of the qualitative data revealed a complex and nuanced picture of participants' interactions with sexually explicit media content. The underlying assumption of most quantitative studies of pornography is that exposure to pornography is likely to have detrimental effects on (open full item for complete abstract)

    Committee: Srinivas Melkote Ph.D. (Advisor); Sandra Faulkner Ph.D. (Committee Member); Michael Horning Ph.D. (Committee Member); Michael Bradie Ph.D. (Committee Member) Subjects: Communication; Mass Media
  • 8. Knauss, Zackery FENTANYL-INDUCED REWARD SEEKING IS SEX AND DOSE DEPENDENT AND IS PREVENTED BY D-CYSTEINE ETHYLESTER WHICH SELECTIVELY ALTERS FENTANYL CA2+ SIGNALING DYNAMICS IN THE PREFRONTAL CORTEX

    PHD, Kent State University, 2024, College of Arts and Sciences / School of Biomedical Sciences

    As of 2022, three million people in the US, and sixteen million worldwide were estimated to suffer from opioid use disorder (OUD). Despite widespread efforts to increase the public availability of medical therapies for OUD, only 2.28% of people suffering from OUD will seek out and be able to sustain abstinence for at least five years. The core objectives of this work were to 1) evaluate the dose- and sex-dependent effects of fentanyl to induce rewarding states, 2) the extent to which D-Cysteine ethylester (D-CYSee) alters affective state and the acquisition of fentanyl-induced reward seeking, 3) how the timing and concentration of fentanyl administration impacts the intrinsic Ca2+ activity of neurons and astroglia from the prefrontal cortex (PFC), and 4) the extent to which D-CYSee alters intrinsic Ca2+ activity in both the presence and absence of fentanyl. To evaluate the effects of fentanyl in the presence and absence of D-CYSee on Ca2+ signaling dynamics in PFC neurons and astrocytes, this work details the development of new methods in real-time fluorescent imaging of intrinsic Ca2+ activity using a non-genetic chemical indicator in cells isolated from the rat PFC in combination with post-hoc live-cell labeling for neurons and astroglia, and a customizable cell-type informed statistical analysis pipeline with backend support for data visualization and meta-analysis. Furthermore, a general characterization of the intrinsic Ca2+ activity in this PFC preparation was conducted; first by examining the involvement of extracellular Ca2+ sources and sodium channel conductance's, followed by a deeper evaluation of the role(s) of voltage-gated L, T, & N/P/Q-Type Ca2+ channels and an assessment of NMDA, AMPA receptor, and GABAA receptor signaling in the expression of intrinsic Ca2+ activity. The findings here support: 1) that fentanyl induces reward seeking in a concentration- and sex-dependent manner, 2) that D-CYSee could be an effective co-treatment with prescribed opioi (open full item for complete abstract)

    Committee: Devin Mueller, Ph.D. (Advisor); Derek S. Damron, Ph.D. (Advisor); Stephen J. Lewis, Ph.D. (Committee Member); Colleen Novak, Ph.D. (Committee Member); Robert Clements, Ph.D. (Committee Member); Rafaela S. C. Takeshita, D.Sc., (Other) Subjects: Behavioral Psychology; Behavioral Sciences; Cellular Biology; Neurosciences
  • 9. Li, Youjun Semiparametric and Nonparametric Model Approaches to Causal Mediation Analysis for Longitudinal Data

    Doctor of Philosophy, Case Western Reserve University, 2024, Epidemiology and Biostatistics

    There has been a lack of causal mediation analysis methods developed for complex longitudinal data. Most existing work focuses on extensions of parametric models that have been well developed for causal mediation analysis for cross-sectional data. To better handle complex, including irregular, longitudinal data, our approach takes advantage of the flexibility of penalized splines and performs causal mediation analysis under the structural equation model framework. The incorporation of penalized splines allows us to deal with repeated measures of the mediator and the outcome that are not all recorded at the same time points. The penalization avoids otherwise difficult choices in selecting knots and prevents the splines from overfitting so that the prediction for future time points will be more reasonable. We also provide the formula for identifying the natural direct and indirect effects based on our semiparametric models, whose inference is carried out by delta method and Monte Carlo approximation. This frequentist approach can be straightforward and efficient when implemented under the linear mixed model (LMM) framework, but it sometimes faces convergence problems as the random effects components introduce complications when using the commonly seen optimization algorithms in most of the statistical software. Although Bayesian modeling under LMM is less likely to face the convergence problem with the help of Markov chain Monte Carlo (MCMC) sampling, it can be computationally expensive compared to the frequentist approach due to the nature of the MCMC algorithm. As an alternative Bayesian approach, Gaussian process regression (GPR) also has the flexibility to fit various data patterns and will be more efficient than Bayesian modeling using MCMC, as the posterior distribution in GPR is a known form from which the posterior samples can be directly drawn. We thus attempt to extend the standard GPR approach to allow multiple covariates of both continuous and categorical (open full item for complete abstract)

    Committee: Pingfu Fu (Committee Chair); David Aron (Committee Member); Mark Schluchter (Committee Member); Jeffrey Albert (Advisor) Subjects: Biostatistics; Statistics
  • 10. Brandt, Michael Psychotherapist Perceptions of Behavioral Treatments for MDD and Chronic Unipolar Depression

    Master of Science in Criminal Justice, Tiffin University, 2022, Forensic Psychology

    Major depressive disorder (MDD) along with chronic unipolar depression present as a substantial, ongoing health challenge of the United States. It is estimated that more than 8 percent of Americans will encounter a severe depressive episode each year. This pervasive level of illness creates consequences extending beyond mental health concerns and into areas of negative public health indicators, economic loss and societal costs to individuals, families, and communities. Treatment models for depression, originating within primary care include prescribed antidepressants with behavioral treatment for cases seemingly resistant to that medication course. Approximately one-half million dedicated professionals across the United States currently offer professional psychotherapy treatments to those seeking relief from depression. Much as with antidepressants, the prevalent behavioral treatments utilized by psychotherapists to treat depression are sometimes effective and sometimes not. What is remarkable is that mechanisms of efficacy for these treatments toward improvement and for preservation of remission states are poorly understood. Research in the psychological literature presents as inconclusive. Enhanced understanding of mediative factors for prevalent psychotherapeutic interventions such as CBT and ACT could greatly benefit continued research as well as the development of more efficient models in diagnosis and clinical care. The present study drew upon lessons learned from past research while employing qualitative analysis of the grounded theory type to assess the perceptions of psychotherapists to the prevalent treatment modalities that constitute their work processes. A codebook was developed as the genesis for a lexicon of behavioral treatment for depressive illness, and a theoretical model was devised capable of supporting the expression of this and other lexical data structures into the psychological research domain while in representation of (open full item for complete abstract)

    Committee: Johnathon Sharp (Advisor) Subjects: Psychology; Psychotherapy
  • 11. Heinzinger, Catherine Identifying and Cardiac Risk-Stratifying Obstructive Sleep Apnea Phenotypic Clusters in a Large Clinical Cohort

    Master of Sciences, Case Western Reserve University, 2023, Clinical Research

    While sleep disorders are implicated in atrial fibrillation (AF), interplay of physiologic alterations and symptoms remains unclear. Sleep-based subtypes can account for this complexity. We hypothesized discrete phenotypes of symptoms and polysomnography-based data on adult patients in the STARLIT Registry (n=43,433) differ in relation to incident AF (8.9%, n=3,596). Clusters, identified using latent class analysis, were used as predictors in multivariable-adjusted Cox proportional hazards models. Five clusters were identified: ‘Hypoxic + Sleepy' had 48% increased risk, ‘Apneas + Arousals' 22% increased risk, and ‘Short Sleep + Low %REM' 11% increased risk of incident AF compared to ‘Long Sleep + High %REM', and ‘Hypopneas' did not differ over a 7.6±3.4 year follow-up period. Consistent with prior evidence of hypoxia as an AF driver and cardiac risk of the sleepy phenotype, this constellation of symptoms and physiologic alterations illustrates risk in the clinical setting, providing potential value as a risk prediction tool.

    Committee: Reena Mehra (Committee Chair); Anna May (Committee Member); Brittany Lapin (Committee Member); Michael Faulx (Committee Member) Subjects: Anatomy and Physiology; Biology; Biomedical Research; Biostatistics; Health; Health Care; Health Sciences; Medicine; Neurobiology; Neurology; Neurosciences; Statistics
  • 12. Abayateye, Philemon A Method for Evaluating Diversity and Segregation in HOPE VI Housing Neighborhoods – Focus on Cuyahoga and Franklin Counties, Ohio

    Doctor of Philosophy, University of Toledo, 2023, Spatially Integrated Social Science

    The increase in rate of international migration to the United States since the late 1960s, coupled with a generally high rate among minority populations, altered the racial and ethnic composition of America's urban neighborhoods. The changing demography and increase in shares of minority subpopulations underscore the salience of conducting multigroup studies of residential and socioeconomic segregation beyond the traditional white versus black dichotomy. Segregation based on subgroup characteristics (de facto or de jure) is problematic, particularly for racial minorities and low-income residents who are limited in moving to areas they can afford. These minority neighborhoods are associated with physical and socioeconomic disadvantage due to public and private de-investment. The undercurrents of segregation were explored in the racial tipping point and white flight literature where non-Hispanic white majority residents exit old inner and central city neighborhoods when the share of minority populations increase beyond a critical threshold. Due to strong correlations between race and income, white flight also tends to concentrate poverty in the abandoned neighborhoods. Beyond this relationship between personal choice and segregation however, local and federal public policies have also been historically linked with segregating urban America. Federal highway programs, mortgage loan underwriting processes, suburban housing developments, and restrictive local zoning laws have created race and income-based segregated spaces. Also, reinvestment programs aimed revitalizing physical and socially distressed neighborhoods tend to yield minimal outcomes. This is often due to either limited funding compared to the magnitude of the problem or lack of sustained political commitment, overemphasis on market-based ideas which alienate minorities and low-income residents, and emphasis on new urbanism housing designs associated net losses in the public housing stock. In this dissertatio (open full item for complete abstract)

    Committee: Daniel Hammel (Committee Chair); Sujata Shetty (Committee Member); Isabelle Nilsson (Committee Member); Neil Reid (Committee Member); Jami Taylor (Committee Member) Subjects: Geographic Information Science; Geography; Public Policy; Urban Planning
  • 13. Zidan, Nader Comorbidities and Socio-economic Factors Affecting COVID-19 Severity: A study of 776,936 Cases and 1,362,545 Controls in Indiana

    Master of Science, The Ohio State University, 2022, Computer Science and Engineering

    The COVID-19 pandemic has impacted global health. To develop an effective strategy, understanding the relationship between comorbidities and COVID-19 outcomes is important. Equally as important is broad access to a large amount of patient data related to COVID-19 for research. A cohort of 776,936 confirmed COVID-19 patients (cases) and 1,362,545 healthy controls (with negative/no COVID-19 testing) was collected from the Regenstrief Institute COVID-19 Research Data Commons (CoRDaCo) in Indiana. Demographics, clinical diagnoses and encounters were collected for both cases and controls. Statistical analysis was conducted to determine the association of several demographic and clinical factors with COVID-19 severity. Data regarding county population and per capita income were obtained from the US Census Bureau. Hypothesis testing is applied to detect associations between various clinical variables and COVID-19 severity. Predictive analysis was conducted to evaluate the predictive power of CoRDaCo EHR data including comorbidities to predict COVID-19 severity. We found that chronic obstructive pulmonary disease (COPD), cardiovascular disease (CVD), and type 2 diabetes (T2D) were found in 3.49%, 2.59% and 4.76% of the COVID-19 patients, respectively. COVID-19 patients with these comorbidities have significantly higher ICU admission rates of 10.23%, 14.33% and 11.11%, respectively, compared to the entire COVID-19 patient population (1.94%). Furthermore, patients with these comorbidities have significantly higher mortality rates of 8.22%, 13.48% and 9.16%, respectively, compared to that of the entire COVID-19 patient population (2.24%). Socio-economic factor analysis suggests potential health disparities among counties in Indiana. Predictive analysis achieved F1-scores of 0.8011 and 0.7057 for classifying COVID-19 cases vs. controls and ICU vs. non-ICU cases, respectively. Overall, the findings indicate that elder patients are more susceptible to COVID-1 (open full item for complete abstract)

    Committee: Xia Ning (Advisor); Huan Sun (Committee Member); Titus Schleyer (Committee Member) Subjects: Bioinformatics; Computer Science
  • 14. Synakowski, Stuart Novel Instances and Applications of Shared Knowledge in Computer Vision and Machine Learning Systems

    Doctor of Philosophy, The Ohio State University, 2021, Electrical and Computer Engineering

    The fields of computer vision and machine learning have made enormous strides in developing models which solve tasks only humans have been capable of solving. However, the models constructed to solve these tasks came at an enormous price in terms of computational resources and data collection. Motivated by the sustainability of continually developing models from scratch to tackle every additional task humans can solve, researchers are interested in efficiently constructing new models for developing solutions to new tasks. The sub-fields of machine learning devoted to this line of research go by many names. Such names include multi-task learning, transfer learning, and few-shot learning. All of these frameworks use the same assumption that knowledge should be shared across models to solve a set of tasks. We define knowledge as the set of conditions used to construct a model that solves a given task. By shared knowledge, we are referring to conditions that are consistently used to construct a set of models which solve a set of tasks. In this work, we address two sets of tasks posed in the fields of computer vision and machine learning. While solving each of these sets of tasks, we show how each of our methods exhibits a novel implementation of shared knowledge leading to many implications for future work in developing systems that further emulate the abilities of human beings. The first set of tasks fall within the sub-field of action analysis, specifically the recognition of intent. Instead of a data-driven approach, we construct a hand-crafted model to infer between intentional/non-intentional movement using common knowledge concepts known by humans. These knowledge concepts are ultimately used to construct an unsupervised method to infer between intentional and non-intentional movement across levels of abstraction. By layers of abstraction we mean that the model needed to solve the most abstract instances of intent recognition, is useful in developing models whi (open full item for complete abstract)

    Committee: Aleix Martinez (Advisor); Abhishek Gupta (Committee Member); Yingbin Liang (Committee Member) Subjects: Artificial Intelligence; Computer Engineering; Computer Science
  • 15. Yuan, Yuan Bayesian Conjoint Analyses with Multi-Category Consumer Panel Data

    PhD, University of Cincinnati, 2021, Arts and Sciences: Mathematical Sciences

    Motivated from the statistical analysis of real consumer panel data, we focus on unveiling consumer traits from multi-category consumer panel data. This dissertation is a sequence of statistical analyses to find household's heterogeneous characteristics in terms of “new choice” preference. We first apply a fixed effects model to see how the household-level and category-level heterogeneity affect households' new choice preference. Second, we regress the estimated individual household effect on demographic variables. Third, we fit a Bayesian binomial hierarchical model to integrate the finding we got from the fixed effects model and the regression model. The real data application demonstrates that the proposed Bayesian hierarchical model successfully finds the household's heterogeneous new choice preference.

    Committee: Hang Joon Kim Ph.D. (Committee Chair); Sanghak Lee Ph.D. (Committee Member); Seongho Song Ph.D. (Committee Member); Xia Wang Ph.D. (Committee Member) Subjects: Statistics
  • 16. Thrush, Corey Modern Analysis of Passing Plays in the National Football League

    Master of Arts (MA), Bowling Green State University, 2021, Mathematics and Statistics

    The National Football League is the most popular professional American Football League. The league publishes and sponsors a data competition on Kaggle called, “Big Data Bowl.” The inspiration for use of this dataset are commentators saying, “analytics does not account for everything.” I answered questions that are frequently debated on networks like ESPN. Those questions are, “does playing in a dome improve passing?”, “are better teams better at passing?”, and “are certain formations better at passing?” With the use of nonparametric statistical techniques, I can fnally give an answer with data evidence to these questions. This data set also has many variables, so we should consider reducing the dimension of the data. Through the use of Principal Component Analysis I can reduce the dimensions, while keeping interpretation and performance of the remaining data. Both nonparametric and multivariate analysis are based upon the on-site availability of the event. However, most features in American football datasets contain some type of time to event data. With the use of survival analysis I was able to examine whether different teams are better at completing passes. Finally, I want to discuss potential problems in my analysis. Daryl Morey, general manager for the Philadelphia 76ers, contends that “football is 10 years behind basketball and basketball is 10 years behind baseball.” Which I would like to expand on such as when we use response variables such as Expected Points Added.

    Committee: John Chen (Advisor); Wei Ning (Committee Member) Subjects: Statistics
  • 17. Matuk, James Bayesian Modelling Frameworks for Simultaneous Estimation, Registration, and Inference for Functions and Planar Curves

    Doctor of Philosophy, The Ohio State University, 2021, Statistics

    Functional Data Analysis (FDA) and Statistical Shape Analysis (SA) are fields in which the data objects of interest vary over a continuum, such as univariate functions and planar curves. While observations are typically measured and stored discretely, there are inherent benefits in acknowledging the infinite-dimensional processes from which the data arise. The typical statistical goals in FDA and SA are summarization, visualization, inference, and prediction. However, the geometric structure of the data presents unique challenges. In FDA, the observations exhibit two distinct forms of variability: amplitude, which describes the magnitude of features and phase, which describes the relative timing of amplitude features. In SA, objects are analyzed through their shape, which is a quantity that remains unchanged if the object is scaled, translated, rotated in space or reparametrized (referred to as shape-preserving transformations). Within both fields, analysis usually follows unrelated sequential steps. First, an estimation step is used to obtain an infinite-dimensional representation of the discretely measured observations. Then, a registration step is used to decouple amplitude and phase variability in the FDA setting, and remove variability in the observations associated with shape-preserving transformations in the SA setting. Finally, inference can be performed based on the registration results. There are two well-documented drawbacks to the sequential pipeline for analysis. (1) There is no formal uncertainty propagation between steps, which leads to overconfidence in inferential results. (2) There is a lack of flexibility under realistic observation regimes, such as sparsely sampled or fragmented observations. Previous methods that have attempted to overcome these drawbacks suffer from being too rigid or fail to account for misregistration of observations. In this thesis, we develop flexible modelling frameworks for FDA and SA that simultaneously perform t (open full item for complete abstract)

    Committee: Oksana Chkrebtii (Advisor); Sebastian Kurtek (Advisor); Peter Craigmile (Committee Member); Radu Herbei (Committee Member) Subjects: Statistics
  • 18. Bard, Ari Modeling and Predicting Heat Transfer Coefficients for Flow Boiling in Microchannels

    Master of Sciences, Case Western Reserve University, 2021, EMC - Mechanical Engineering

    Flow boiling has become a reliable mode of adapting to larger power densities and greater functions because it is able to utilize both the latent and sensible heat contained within a specified coolant. There are currently few available tools proven reliable when predicting heat transfer coefficients during flow boiling in microchannels. The most popular methods rely on semi-empirical correlations derived from experimental data but can only be applied to a narrow subset of testing conditions. This study will use multiple data science methods to accurately predict the heat transfer coefficient during flow boiling in micro-channels on a database consisting of 16,953 observations collected across 50 experiments using 12 working fluids. The support vector machine model performed best, with a Mean Absolute Percentage Error (MAPE) of 11.3%. The heat flux, vapor-only Froude number, and quality proved to be especially significant variables across 90% of over 110 different models.

    Committee: Chirag Kharangate PHD (Advisor); Brian Maxwell PHD (Committee Member); Roger French PHD (Committee Member) Subjects: Mechanical Engineering
  • 19. Cotra, Aditya Kousik Trend Analysis on Artificial Intelligence Patents

    MS, University of Cincinnati, 2021, Engineering and Applied Science: Computer Science

    This paper is a study of trends in patenting of Artificial Intelligence technologies in the US using natural language techniques. Patents contain significant knowledge of the current scenario of commercial technologies. Natural language techniques can be used to extract useful information from patents which can reduce human effort to analyze them. Around 104,000 patents related to artificial intelligence technologies filed in the US were collected from the WIPO database from 2000 to 2018. Topic modeling was performed on the extracted text data. Patent applications were then categorized into five categories: Companies, Individuals, Research Institutes, Government and Universities and the trends of applicants in the field of Artificial Intelligence for each of the categories were compared.

    Committee: Anca Ralescu Ph.D. (Committee Chair); Kenneth Berman Ph.D. (Committee Member); Dan Ralescu Ph.D. (Committee Member) Subjects: Computer Science
  • 20. Zhang, Jianzhe Development of an Apache Spark-Based Framework for Processing and Analyzing Neuroscience Big Data: Application in Epilepsy Using EEG Signal Data

    Master of Sciences, Case Western Reserve University, 0, EECS - Computer and Information Sciences

    Brain functional connectivity measures are used to study interactions between brain regions in various neurological disorders such as Alzheimer's Disease and epilepsy. In particular, high-resolution electrophysiological signal data recorded from intracranial electrodes, such as stereotactic electroencephalography (SEEG) signal data, is often used to characterize the properties of brain connectivity in neurological disorders. For example, SEEG data is used to lateralize the epileptogenic zone and characterize seizure networks in epilepsy. However, there are several computational challenges associated with efficient and scalable analysis of signal data in neurological disorders due to the large volume and complexity of signal data. In order to address the challenges associated with processing and analyzing signal datasets, we have developed an integrated platform called Neuro-Integrative Connectivity (NIC) platform that integrates and streamlines multiple data processing and analysis steps into a single tool. In particular, in this thesis we have developed a suite of new approaches covering signal data format, indexing structure, and Apache Spark libraries to support efficient and scalable signal data management for applications in neurological disorders such as epilepsy. Our evaluations demonstrate the utility of Apache Spark in supporting neuroscience Big Data application; however, our results also demonstrate that Apache Spark is not well suited for all types of computational tasks associated with signal data management. Therefore, the overall objective of this thesis is to identify specific computational tasks that benefit from the use of main memory-based Apache Spark methods in neuroscience Big Data applications. The new NIC platform developed in this thesis is a significant resource for the brain connectivity research community as it has applications in real world settings for advancing research in neurological disorders using signal data.

    Committee: Satya Sahoo (Advisor); Jing Li (Committee Chair); An Wang (Committee Member) Subjects: Bioinformatics; Computer Science