Skip to Main Content

Basic Search

Skip to Search Results
 
 
 

Left Column

Filters

Right Column

Search Results

Search Results

(Total results 27)

Mini-Tools

 
 

Search Report

  • 1. Arthur, Eugeniah Amma Pemah Projection Pursuit Indices Based on Weighted L2 Statistics for Testing Normality

    Doctor of Philosophy (Ph.D.), Bowling Green State University, 2023, Mathematics/Mathematical Statistics

    Projection pursuit is the process of finding interesting d-dimensional projections from an n × p dataframe, where n is the number of rows and p is the number of variables (d < p). To find an interesting structure, a scalar function called the projection pursuit index is optimized. In literature, projection pursuit indices such as those proposed in Friedman (1987); Posse (1995a); Perisic and Posse (2005) among others, were all based on finding the projection that deviated the most from the standard normal distribution. Thus, a goodness of fit test statistic for normality is a plausible projection pursuit index. However, most of these goodness of fit test statistics have limitations, such as not being rotation invariant and exhibiting increased computational complexity in the multivariate case. Hence, more robust goodness of fit test statistics for multivariate data must be considered. In this work, projection pursuit indices based on recent and more robust test statistics for testing normality, namely the Baringhaus, Henze, Epps, and Pulley (BHEP), energy, and Gaussian kernel energy (GKE) test statistics are proposed as much better indices for projection pursuit. The BHEP and the GKE test statistics are dependent on a tuning parameter. Therefore, novel ways for selecting tuning parameters based on cross-validation techniques are presented. Furthermore, the proposed projection pursuit indices are evaluated to determine if they exhibit ideal behaviors of a projection pursuit index and are able to identify interesting hidden structures in both simulated and real datasets.

    Committee: Maria Rizzo Ph.D. (Committee Chair); Angela Nelson Ph.D. (Other); Umar Islambekov Ph.D. (Committee Member); Wei Ning Ph.D. (Committee Member) Subjects: Statistics
  • 2. Stalker, William The Effect of Fractal Dimensionality on Behavioral Judgments of Built Environments

    Master of Science (MS), Wright State University, 2022, Human Factors and Industrial/Organizational Psychology MS

    This research examines the effects of fractal dimensionality on ratings of beauty, relaxation, and interest, when these patterns are incorporated in a built space. Previous findings suggest that fractal patterns can be used to mimic the beneficial psychological and physiological effects that arise from viewing nature. This research focuses on studying the impact of fractal patterns when presented within urban environments. The findings here are primarily consistent with previous research. Medium D patterns are preferred over the other pattern complexities. Low D patterns are consistently rated as more relaxing. High D patterns are rated as being more interesting over low D patterns, but the difference between high D and medium D might be smaller than previously thought. These collective findings support the further investigation of the implementation of fractal patterns to promote a form of mental enrichment for inhabitants and a reduction of the stress in an urban environment.

    Committee: Assaf Harel Ph.D. (Advisor); Joori Suh Ph.D. (Committee Member); Ion Juvina Ph.D. (Committee Member) Subjects: Psychology
  • 3. Dozier, Robbie Navigating the Metric Zoo: Towards a More Coherent Model For Quantitative Evaluation of Generative ML Models

    Master of Sciences, Case Western Reserve University, 2022, EECS - Computer and Information Sciences

    This thesis studies a family of high-dimensional generative procedures modeled by Deep Generative Models (DGMs). These models can sample from complex manifolds to create realistic images, video, audio, and more. In prior work, generative models were evaluated using likelihood criteria. However, likelihood has been shown to suffer from the Curse of Dimensionality, and some generative architectures such as Generative Adversarial Networks (GANs) do not admit a likelihood measure. While some other metrics for GANs have been proposed in the literature, there has not been a systematic study and comparison between them. In this thesis I conduct the first comprehensive empirical analysis of these generative metrics, comparing them across several axes including sample quality, diversity, and computational efficiency. Second, I propose a new metric which employs the concept of typicality from information theory and compare it to existing metrics. My work can be used to answer questions about when to use which kind of metric when training DGMs.

    Committee: Soumya Ray (Advisor); Michael Lewicki (Committee Member); Harold Connamacher (Committee Member) Subjects: Artificial Intelligence; Computer Science
  • 4. Islam, Md Rashedul Perpetrator Workplace Aggression: Development of a Perpetrator Aggression Scale (PAS)

    Doctor of Philosophy (PhD), Wright State University, 2022, Human Factors and Industrial/Organizational Psychology PhD

    Perpetrator workplace aggression has always been considered as a uni-dimensional construct from the uni-dimensional perspective. The most popular and widely used scale, interpersonal deviance scale (IDS; Bennett & Robinson, 2000), to assess perpetrator workplace aggression has only seven items (i.e., seven content areas), which lacks a high level of content-related and construct-related validity. Recently, researchers have suggested that perpetrator workplace aggression may be a construct with a general factor at the top (Sackett & DeVore, 2001); however, this general factor can be less clear for a more complex model (Marcus et al., 2016). Using three samples (N = 271, 337, & 264), this research found that perpetrator workplace aggression was also a uni-dimensional construct from the multi-dimensional perspective, the general factor was very clear for a complex model, and developed a new scale with a higher level of content-related (i.e., 24 different content areas of perpetrator workplace aggression) and construct-related validity (by developing a large nomological network). In addition to a higher level of content-related and construct-related validity, the new scale showed a higher level of internal consistency and substantive validity. Hence, I recommend that researchers and practitioners use this new scale in future when assessing perpetrator workplace aggression.

    Committee: Nathan A. Bowling Ph.D. (Advisor); David M. LaHuis Ph.D. (Committee Member); Corey E. Miller Ph.D. (Committee Member); Brian D. Lyons Ph.D. (Committee Member) Subjects: Occupational Psychology; Organizational Behavior; Psychology
  • 5. Choudhary, Rishabh Construction and Visualization of Semantic Spaces for Domain-Specific Text Corpora

    MS, University of Cincinnati, 2021, Engineering and Applied Science: Electrical Engineering

    An important objective in Natural Language Processing is representing pieces of text in numerical representations through the process of text embedding. Recent language models and text encoders have proved successful in generating high quality embeddings that perform well on tasks such as sentiment analysis, question and answer response, and summarization. Many of these models are available pre-trained on enormous amounts of data, providing downstream applications with general-purpose semantic spaces. A useful application of text embeddings is creating a semantic space on a specific topic based on a specialized dataset. This semantic space can be used to track the trajectory of a piece of text to see where the “train of thought” is going. In this type of application, the performance of embeddings on down-stream tasks is not as important as the relationship between the embeddings themselves. Specifically, it is important for semantically similar units of text to have embeddings that are close to each other. Most text embedding methods produce text embeddings in high-dimensional spaces, with a dimensionality ranging from a few hundred to thousands. However, it is often useful to visualize semantic spaces in very low dimension, which requires the use of dimensionality reduction methods. It is not clear what language models and what method of dimensionality reduction would work well in these cases. This thesis provides a method of evaluating combinations of embedding methods and dimensionality reduction methods. Using the results from this analysis, a method of creating a cognitive map from a small and specialized dataset is implemented and evaluated.

    Committee: Ali Minai Ph.D. (Committee Chair); Raj Bhatnagar Ph.D. (Committee Member); Yizong Cheng Ph.D. (Committee Member); Simona Doboli Ph.D. (Committee Member) Subjects: Artificial Intelligence
  • 6. Ray, Sujan Dimensionality Reduction in Healthcare Data Analysis on Cloud Platform

    PhD, University of Cincinnati, 2020, Engineering and Applied Science: Computer Science and Engineering

    Nowadays, it is becoming very easy to have a huge collection of healthcare data, especially because of relatively cheap wearable devices. Subsequently, we can mine clinical data and acquire meaningful information. It helps in making better decisions and improve the healthcare sector by minimizing the costs. Healthcare datasets that are available in public domain have lots of features and it is manually impossible to identify the factors that contribute to the disease [1]. Therefore, it is necessary to use Machine Learning (ML) algorithms to identify the most important features that will help in finding out the occurrence of diseases from huge number of features. Thus, we could predict the disease more accurately with the model trained by only the top features of the dataset. Considering the fact that the healthcare data is coming from different sources with different sizes, there is a need for cloud-based platform. The first aim of this dissertation is to focus on the important field where big data is used for health care to diagnose diseases before they occur or to avoid them. Breast Cancer (BC) is the second most common cancer in women after skin cancer and has become a major health issue. As a result, it is very important to diagnose BC correctly and categorizing the tumors into malignant or benign groups. We know that ML techniques that have unique advantages and are widely used to analyze complex BC dataset and predict the disease. Wisconsin Diagnostic Breast Cancer (WDBC) dataset has been used to develop predictive models for BC by researchers in this field. In this dissertation, we propose a method for analyzing and predicting BC on the same dataset using Apache Spark. The experiments are executed on Hadoop cluster, a cloud platform provided by the Electrical Engineering and Computer Science (EECS) department at the University of Cincinnati. Our results show that selecting the right features significantly improves the accuracy in predicting BC. The s (open full item for complete abstract)

    Committee: Marc Cahay Ph.D. (Committee Chair); Dharma Agrawal D.Sc. (Committee Member); Rui Dai Ph.D. (Committee Member); Wen-Ben Jone Ph.D. (Committee Member); Manish Kumar Ph.D. (Committee Member); Carla Purdy Ph.D. (Committee Member) Subjects: Computer Science
  • 7. Girish, Deeptha Action Recognition in Still Images and Inference of Object Affordances

    PhD, University of Cincinnati, 2020, Engineering and Applied Science: Electrical Engineering

    Action recognition is an important computer vision task. It focuses on identifying the behavior or the action performed by humans from images. Action recognition using various wearable sensors and videos is a well studied and well established topic. This thesis focuses on action recognition in still images, a new and challenging area of research. For example, understanding motion from static images is a difficult task as spatio-temporal features that is most commonly used for predicting actions is not available. Action recognition in still images has a variety of applications such as searching for frames in videos using action, searching a database of images using an action label, surveillance, robotic applications etc. It can also be used to give a more meaningful description of the image. The goal of this thesis is to perform action recognition in still images and infer object affordances by characterizing the interaction between the human and the object. Object affordance refers to determining the use of an object based on its physical properties. The main idea is to learn high level concepts such as action and object affordance by extracting information of the objects and their interactions in an image.

    Committee: Anca Ralescu Ph.D. (Committee Chair); Kenneth Berman Ph.D. (Committee Member); Rashmi Jha Ph.D. (Committee Member); Wen-Ben Jone Ph.D. (Committee Member); Dan Ralescu Ph.D. (Committee Member) Subjects: Electrical Engineering
  • 8. Lu, Tien-hsin SqueezeFit Linear Program: Fast and Robust Label-aware Dimensionality Reduction

    Master of Mathematical Sciences, The Ohio State University, 2020, Mathematical Sciences

    We introduce the SqueezeFit linear program as a fast and robust dimensionality reduction method. This program is inspired by both the SqueezeFit semi-definite program [10] and scGeneFit [3], which is a linear program version of SqueezeFit that has been used to classify single cell RNA-sequence data with a given structured partition. The original SqueezeFit semi-definite program has a strong theoretical background but it exhibits slow runtimes with large data sets. In contrast, scGeneFit performs efficiently and robustly with scRNA-seq data given either flat or hierarchical label partitions, but it does not have much theoretical justification for its performance. The SqueezeFit linear program fills this computational and theoretical gap. After providing new theoretical guarantees, we illustrate the performance of the SqueezeFit linear program on real-world gene expression data.

    Committee: Dustin G. Mixon Dr. (Advisor); Dongbin Xiu Dr. (Committee Member) Subjects: Mathematics
  • 9. Zhang, Yuankun (Ultra-)High Dimensional Partially Linear Single Index Models for Quantile Regression

    PhD, University of Cincinnati, 2018, Arts and Sciences: Mathematical Sciences

    Nonparametric modeling tends to capture the underlying structures in the data without imposing strong model assumptions. The nonparametric estimation provides powerful data-driven approaches to fit a flexible model to the data. Single-index models are useful and appealing tools to preserve the flexibility and interpretability but to overcome “curse of dimensionality” problems in nonparametric regression. In this dissertation, we consider partially linear single-index models for quantile regression. This set of semi-parametric models allow some of covariates in linear form and other covariates in nonparametric term to reflect the non-linear feature in modeling the conditional quantiles of the response variable. We first develop efficient estimation and variable selection for partially linear single-index quantile models in the fixed dimension. We use spline smoothing with B-spline basis to estimate the nonparametric component and adopt the non-convex penalties to select variables simultaneously. We study the theoretical properties of the resulting estimators and establish the “oracle property” for penalized estimation. With the rise of new technologies used in data collection and storage, high dimensional data spring up and become available in various scientific fields. Often researchers face the new challenge that the dimension of the explanatory variables, p, may increase with the sample size, n, or potentially become much larger than n. For those problems of high to ultra-high dimensionality, data are likely to be heterogeneous and the underlying model is prone to be nonlinear. Variable selection will also play a critical role in the dimension reduction and modeling process. Thus, we propose a penalized estimation under the sparsity assumption for partially linear single-index quantile models in high dimension. We further investigate ultra-high dimensional penalized partially linear single-index quantile models in which both linear components and single-index vari (open full item for complete abstract)

    Committee: Dan Ralescu Ph.D. (Committee Chair); Yan Yu Ph.D. (Committee Chair); Emily Kang Ph.D. (Committee Member); Ju-Yi Yen (Committee Member) Subjects: Statistics
  • 10. Galbincea, Nicholas Critical Analysis of Dimensionality Reduction Techniques and Statistical Microstructural Descriptors for Mesoscale Variability Quantification

    Master of Science, The Ohio State University, 2017, Materials Science and Engineering

    The transition of newly developed materials from the laboratory to the manufacturing floor is often hindered by the task of quantifying the material's inherit variability which spans from the atomistic to macroscale. This impedance is coupled with the task of linking this variability observed at these length scales and ultimately correlating this multidimensional variance to the macroscale performance of the material. This issue has lead to the development of statistical and mathematical frameworks for evaluating material variability. In this work, the author employs one such methodology for the purpose of mesoscale variability quantification with the goal to further explore and enhance this framework while simultaneously presenting the pathway as a computational design tool. This stochastic representation of microstructure allows for the delineation of materials to be highly dependent upon the topology of the material's structure and allows for digital representation via statistical volume elements (SVEs). Quantification of the topology of these SVEs can be achieved through utilization of statistical microstructural descriptors (SMDs), which inevitably leads to an extremely high order data set for each microstructure realization. This high order data set can then be dimensionally reduced via kernel principal component analysis (KPCA), thus allowing for the variance of the microstructure to be observed through the generation of microstructure visualizations. Enhancement of these visualizations can then be achieved through the use of the 1-way multivariate analysis of variance (1-way MANOVA). The reduced order SMD data set can then be combined with property results determined via finite element analysis (FEA) producing microstructure-property maps, thus allowing for both the microstructure and property variance to be observed graphically. Lastly, predictive models can be trained on the reduced order SMD data sets and property results utilizing the machine learning te (open full item for complete abstract)

    Committee: Stephen Niezgoda Dr. (Advisor); Dennis Dimiduk Dr. (Committee Member); Soheil Soghrati Dr. (Committee Member) Subjects: Materials Science; Mathematics; Statistics
  • 11. Abdel-Rahman, Tarek Mixture of Factor Analyzers (MoFA) Models for the Design and Analysis of SAR Automatic Target Recognition (ATR) Algorithms

    Doctor of Philosophy, The Ohio State University, 2017, Electrical and Computer Engineering

    We study the problem of target classification from Synthetic Aperture Radar (SAR) imagery. Target classification using SAR imagery is a challenging problem due to large variations of target signature as the target aspect angle changes. Previous work on modeling wide angle SAR imagery has shown that point features, extracted from scattering center locations, result in a high dimensional feature vector that lies on a low dimensional manifold. We propose to use rich probabilistic models for these target manifolds to analyze classification performance as a function of Signal-to-noise ratio (SNR) and Bandwidth. We employ Mixture of Factor Analyzers (MoFA) models to approximate the target manifold locally, and use error bounds for the estimation and analysis of classification error performance. We compare our performance predictions with the empirical performance of practical classifiers using simulated wideband SAR signatures of civilian vehicles. We then extend this work to design optimal maximally discriminative projections (MDP) for the manifold structured data. An optimization algorithm is proposed that maximizes the Kullback Leibler (KL)-divergence between two mixture models through optimizing the closed-form "Variational Approximation" of the KL-divergence between the MoFA models. We then propose to generalize our MDP dimensionality reduction technique to multi-class using non-linear constrained optimization through minimax quasi-Newton methods. The proposed MDP algorithm is compared to existing dimensionality reduction techniques using simulated Civilian Vehicles datadome dataset and real-world MSTAR data.

    Committee: Emre Ertin (Advisor); Randolph Moses (Committee Member); Bradley Clymer (Committee Member) Subjects: Electrical Engineering
  • 12. Sulecki, Nathan Characterizing Dimensionality Reduction Algorithm Performance in terms of Data Set Aspects

    Bachelor of Science (BS), Ohio University, 2017, Computer Science

    Dimensionality reduction algorithms are used in every field that uses data, and for purposes ranging from facilitating data visualization to reducing the amount of data that must be considered in analysis. Research has established that there is no definitive best algorithm—any algorithm can be the front runner depending on what dataset is used. Despite this, relatively little research has been conducted in looking for dataset aspects that can predict algorithm performance. This research aims to serve as a foundational work for answering this question. Three dataset aspects (number of dimensions, continuity of dimensions, multivariate normality) were selected as potential factors that can affect performance based on how most algorithms vary in approach. Data sets were selected or created to cover a spread of the first two aspects (small, medium, and large number of dimensions, and binary, n-ary, continuous, and mixed continuity) and then tested to see if they exhibit multivariate normality. These data sets were then reduced using 1 recent and 4 well known dimensionality reduction algorithms: SMA, PCA, mRMR, kPCA, and nlPCA. This reduced data was used as input in a range of classification and clustering algorithms, and the performance of these algorithms was measured and compared. It was found that, under the tested datasets, neither continuity nor dimensionality served as predictors for algorithm performance; however, results point to another, previously unexplored way of characterizing data sets that could be a significant predictor of performance.

    Committee: Ronaldo Vigo (Advisor) Subjects: Computer Science
  • 13. Hogrebe, Nathaniel Modifying Cellular Behavior Through the Control of Insoluble Matrix Cues: The Influence of Microarchitecture, Stiffness, Dimensionality, and Adhesiveness on Cell Function

    Doctor of Philosophy, The Ohio State University, 2016, Biomedical Engineering

    While the soluble biochemical environment has traditionally been viewed as the most important determinant of cell behavior, accumulating evidence indicates that insoluble cues from a cell's surroundings are crucial to a variety of cellular processes. Differences in matrix properties such as stiffness, adhesiveness, and microarchitecture can influence cell shape, cytoskeletal organization, and adhesion formation. These changes can modify enzymatic pathways, control the localization of transcription factors, and even directly modulate gene expression to change overall cell behavior in response to a cell's physical surroundings. While the effects of various insoluble cues have been successfully demonstrated in 2D culture, there has been a lack of fibrous, biomimetic substrates suitable for systematically studying the role of these insoluble cues within a more physiological 3D environment. To this end, we developed and characterized a two component self-assembling peptide (SAP) system that possessed tunable stiffness (controlled via KFE-8 concentration) and RGD binding site density (controlled via KFE-RGD concentration) as well as a fibrous microarchitecture similar to collagen. In contrast to other synthetic 3D matrices such as polyethylene glycol (PEG) or alginate gels which constrict cell spreading, cells encapsulated within these gels were able to adopt non-spherical morphologies similar to those of cells within hydrogels made of natural ECM components. Using this system, we observed that the presence of the RGD binding site was required for both human mesenchymal stem cells (hMSCs) and human umbilical vein endothelial cells (HUVECs) to initially spread within these SAP gels. Furthermore, the extent of this spreading and HUVEC microvascular network (MVN) formation was dictated by stiffness, but each cell type had a different optimal stiffness that was most conducive to these non-spherical morphologies. This culture system was then used to explore the differe (open full item for complete abstract)

    Committee: Keith Gooch (Advisor) Subjects: Biomedical Engineering
  • 14. Richards, Christopher Ed Mieczkowski's Contradictory Cues in Dimensionality in Painting and Sculpture

    MA, Kent State University, 2016, College of the Arts / School of Art

    The aim of this thesis is to show how Edwin Mieczkowski's exploration of visual perception created contradictory dimensional cues in painting and sculpture. By utilizing black, white, and grey as key components to constructing and solving visual problems, Mieczkowski explored the depiction of three-dimensional space on two-dimensional surfaces and created three-dimensional works that emphasized the two-dimensional picture plane. Tracking Mieczkowski's work from his involvement in the Anonima Group in the 1960s to his sculptural works in the 1980s, I argue that the perception of depth of space depicted using hard-edged geometric shapes lies at the center of his artistic development. While Mieczkowski the influence of previous movements such as De Stijl and Consctructivism gave him a visual vocabulary, it was his interest in how we visually perceive the world around us that led him to create dimensional puzzles. This thesis draws heavily on primary resources; including letters, interviews, exhibition catalogs and reviews.

    Committee: John-Michael Warner (Advisor); Albert Reischuck (Committee Member) Subjects: Art Criticism; Art History
  • 15. Ginsburg, Shoshana Machine-Based Interpretation and Classification of Image-Derived Features: Applications in Digital Pathology and Multi-Parametric MRI of Prostate Cancer

    Doctor of Philosophy, Case Western Reserve University, 2016, Biomedical Engineering

    The analysis of medical images--from magnetic resonance imaging (MRI) to digital pathology--for disease characterization typically involves extraction of hundreds of features, which may be used to predict disease presence, aggressiveness, or outcome. Unfortunately, the dimensionality of the feature space poses a formidable challenge to the construction of robust classifiers for predicting disease presence and aggressiveness. In this work we present novel strategies to facilitate the construction of robust, interpretable classifiers when the dimensionality of the feature space is high. In the context of prostate cancer, we demonstrate the benefit of our approach for identifying (a) radiomic features that are useful for detecting prostate cancer on multi-parametric MRI, (b) radiomic features that predict the risk of prostate cancer recurrence on T2-weighted MRI, and (c) histomorphometric features describing cellular and glandular architecture on digital pathology images that predict the risk of prostate cancer recurrence post-treatment. In the context of breast cancer, we identify histomorphometric features describing cancer patterns in estrogen receptor positive (ER+) breast cancer tissue slides that can predict (a) which cancer patients will have recurrence following treatment with tamoxifen and (b) risk category as determined by a 21 gene expression assay called Oncotype DX. Additionally, we also investigate whether radiomic features characterizing prostate tumors that manifest in the peripheral zone of the prostate are different from radiomic features characterizing transition zone tumors, and we develop a novel approach for pharmacokinetic modeling on dynamic contrast-enhanced MRI that relies exclusively on prostate voxels, with no reliance on an arterial input function or reference tissue.

    Committee: Anant Madabhushi (Advisor) Subjects: Biomedical Engineering; Medical Imaging; Radiology
  • 16. Coleman, Ashley Feature Extraction using Dimensionality Reduction Techniques: Capturing the Human Perspective

    Master of Science (MS), Wright State University, 2015, Computer Science

    The purpose of this paper is to determine if any of the four commonly used dimensionality reduction techniques are reliable at extracting the same features that humans perceive as distinguishable features. The four dimensionality reduction techniques that were used in this experiment were Principal Component Analysis (PCA), Multi-Dimensional Scaling (MDS), Isomap and Kernel Principal Component Analysis (KPCA). These four techniques were applied to a dataset of images that consist of five infrared military vehicles. Out of the four techniques three out of the five resulting dimensions of PCA matched a human feature. One out of five dimensions of MDS matched a human feature. Two out of five dimensions of Isomap matched a human feature. Lastly, none of the resulting dimensions of KPCA matched any of the features that humans listed. Therefore PCA was the most reliable technique for extracting the same features as humans when given a set number of images.

    Committee: Pascal Hitzler Ph.D. (Advisor); Mateen Rizki Ph.D. (Committee Member); John Gallagher Ph.D. (Committee Member) Subjects: Computer Science
  • 17. Landgraf, Andrew Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters

    Doctor of Philosophy, The Ohio State University, 2015, Statistics

    Principal component analysis (PCA) is very useful for a wide variety of data analysis tasks, but its implicit connection to the Gaussian distribution can be undesirable for discrete data such as binary and multi-category responses or counts. Exponential family PCA is a popular alternative to dimensionality reduction of discrete data. It is motivated as an extension of ordinary PCA by means of a matrix factorization, akin to the singular value decomposition, that maximizes the exponential family log-likelihood. We propose a new formulation of generalized PCA which extends Pearson's mean squared error optimality motivation for PCA to members of the exponential family. In contrast to the existing approach of matrix factorizations for exponential family data, our generalized PCA provides low-rank estimates of the natural parameters by projecting the saturated model parameters. Due to this difference, the number of parameters does not grow with the number of observations and the principal component scores on new data can be computed with simple matrix multiplication. When the data are binary, we derive explicit solutions of the new generalized PCA (or logistic PCA) for data matrices of special structure and provide a computationally efficient algorithm for the principal component loadings in general. We also formulate a convex relaxation of the original optimization problem, whose solution might be more effective for prediction, and derive an accelerated gradient descent algorithm. The method and algorithms for binary data are extended to other distributions, including Poisson and multinomial, and the scope of the new formulation for generalized PCA is further extended to incorporate weights, missing data, and variable normalization. These extensions enhance the utility of the proposed method for a variety of tasks such as collaborative filtering and visualization. Through simulation experiments, we compare our formulation of generalized PCA to ordinary PCA and the p (open full item for complete abstract)

    Committee: Yoonkyung Lee (Advisor); Vincent Vu (Committee Member); Yunzhang Zhu (Committee Chair) Subjects: Statistics
  • 18. Nsang, Augustine An Empirical Study of Novel Approaches to Dimensionality Reduction and Applications

    PhD, University of Cincinnati, 2011, Engineering and Applied Science: Computer Science and Engineering

    Dimensionality reduction is becoming increasingly important in the field of machine learning. In this thesis, we examine several traditional methods of dimensionality reduction, which include random projections, principal component analysis, singular value decomposition, kernel principal component analysis and discrete cosine transform. We also examine several existing applications of random projections (or dimensionality reduction, in general). In their paper, Random projections in dimensionality reduction: Applications to image and text data (2001), Bingham and Manilla suggest the use of random projections for query matching in a situation where a set of documents, instead of one particular one, were searched for. This suggests another application of random projections, namely to reduce the complexity of the query process. In this thesis, we explain why this approach fails, and suggest three alternative approaches to reducing the complexity of the query process using dimensionality reduction. We also outline query-based dimensionality reduction methods that can be used for image and web data. In each of the traditional approaches to dimensionality reduction (named above), each attribute in the reduced set is actually a linear combination of the attributes in the original data set. In this thesis, we take the position that true dimensionality reduction is obtained when the set of attributes in the reduced set is a proper subset of the attributes in the original data set, and we discuss seven novel approaches which satisfy this requirement. Using these seven approaches, as well as the RP and PCA approaches, we discuss several ways in which dimensionality reduction can be used for high dimensional clustering and classification.

    Committee: Anca Ralescu PhD (Committee Chair); Irene Diaz PhD (Committee Member); Sofia Visa PhD (Committee Member); Kenneth Berman PhD (Committee Member); Yizong Cheng PhD (Committee Member) Subjects: Computer Science
  • 19. DWIVEDI, SAURABH DIMENSIONALITY REDUCTION FOR DATA DRIVEN PROCESS MODELING

    MS, University of Cincinnati, 2003, Engineering : Industrial Engineering

    Data driven process modeling requires data acquisition for its analysis. This technique has become very common in industrial applications to comprehend any system/ process under consideration. With the increase in number of variables, the number of relationships to be considered increases and it gets very difficult to model and analyze any system. This is called the ‘curse of dimensionality' and presents a challenge for the research community that has lead to the concept of dimensionality reduction of data sets. Dimensionality reduction focuses on determining significant parameters for the process representation with respect to the output of interest and discarding the unimportant ones. There are many methods available in the literature to deal with the issue of dimensionality reduction but a generic methodology to deal with different datasets (containing both continuous and discrete parametric values) is still sought for. This thesis tries to come up with a solution to the above stated problem using existing statistical methods of principal components analysis and clustering techniques. An algorithm is developed and is validated on standard data sets and a simulated data set. It is then applied to a real world industrial dataset. The results obtained with reduced data set as compared to the complete dataset are better for both, the classification as well as the approximation problems but there are a few issues realized with the methodology. A methodology to come up with the values of user inputs: clustering radius, amount of percentage variance to be included in the algorithm and the stopping criterion for the algorithm provide scope for future work. With the results obtained and the accuracies of the models built with the reduced data set, it can be very comfortably said that the algorithm does provide a generic solution to the problem of dimensionality reduction but some issues need to be addressed to make the developed methodology comprehensive.

    Committee: Dr. Samuel Huang (Advisor) Subjects:
  • 20. Pathical, Santhosh Classification in High Dimensional Feature Spaces through Random Subspace Ensembles

    Master of Science in Engineering, University of Toledo, 2010, Engineering (Computer Science)

    This thesis presents an empirical simulation study on application of machine learning ensembles based on random subspace methodology to classification problems with high-dimensional feature spaces. The motivation is to address challenges associated with algorithm scalability, data sparsity and information loss due to the so-called curse of dimensionality. A simulation-based empirical study is conducted to assess the performance profile of the random subspace or subsample ensemble classifier for high dimensional spaces with up to 20,000 features. Subsampling rate and methodology, base learner type, base classifier count, and composition of base learners are among the parameters that were explored through the simulation study. The simulation study employed the WEKA Machine Learning Workbench and five datasets with large feature counts up to 20,000 from the UCI Machine Learning Repository. Machine learners naive Bayes, k-nearest neighbor and C4.5 decision tree were used as base classifiers of the random subspace ensemble which used voting as the combiner method. Homegeneous (i.e. all base classifiers are based on a single machine learner type) as well as heterogeneous (base classifiers are a mix of multiple machine learners) random subspace ensembles were explored on the set of datasets for prediction accuracy, SAR and cpu time performance measures. Simulation study further investigated the effect of random sampling with replacement, random sampling without replacement, and partitioning techniques on the random subspace ensemble. Simulation results indicated that random subspace ensembles which employ as low as 10% to 15% subsampling rates, 25 or more base classifiers, mixed or hybrid composition of base learners, and random sampling without replacement perform competitively with other leading machine learning classifiers on the datasets evaluated. Results also showed in a more generalized context that the random subspace or subsample ensembles scaled up with the incre (open full item for complete abstract)

    Committee: Gursel Serpen PhD (Advisor); Mansoor Alam PhD (Committee Member); Suzan Orra PhD (Committee Member) Subjects: Computer Science