Doctor of Philosophy, The Ohio State University, 2023, Statistics
Exponential family PCA (Collins et al., 2001) is a widely used dimension reduction tool for capturing a low-dimensional latent structure of exponential family data such as binary data or count data. As an extension of principal component analysis (PCA), it imposes a low-rank structure on the natural parameter matrix, which can be factorized into two matrices, namely, the principal component loadings matrix and scores matrix. These loadings and scores share the same interpretation and functionality as those in PCA. Loadings enable exploration of associations among variables, scores can be utilized as low-dimensional data embeddings, and estimated natural parameters can impute missing data entries. Despite the popularity of exponential family PCA, we find several statistical issues associated with this method. We investigate these issues from a statistical perspective and propose remedies in this dissertation.
Our primary concern arises from the joint estimation of loadings and scores through the maximum likelihood method. As in the well-known incidental parameter problem, this formulation with scores as separate parameters may result in inconsistency in the estimation of loadings under the classical asymptotic setting where the data dimension is fixed. We examine the population version of this formulation and show that it lacks Fisher consistency in loadings.
Additionally, estimating scores can be viewed as performing a generalized linear model with loadings as covariates. Maximum likelihood estimation (MLE) bias is naturally involved in this process but is often ignored. Upon identifying two major sources of bias in the estimation process, we propose a bias correction procedure to reduce their effects. First, we deal with the discrepancy between true loadings and their estimates under a limited sample size. We use the iterative bootstrap method to debias loadings estimates. Then, we account for sampling errors in loadings by treating them as covariates with me (open full item for complete abstract)
Committee: Yoonkyung Lee (Advisor); Asuman Turkmen (Committee Member); YunZhang Zhu (Committee Member)
Subjects: Statistics