Search ETDs:
Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters
Landgraf, Andrew J

2015, Doctor of Philosophy, Ohio State University, Statistics.
Principal component analysis (PCA) is very useful for a wide variety of data analysis tasks, but its implicit connection to the Gaussian distribution can be undesirable for discrete data such as binary and multi-category responses or counts. Exponential family PCA is a popular alternative to dimensionality reduction of discrete data. It is motivated as an extension of ordinary PCA by means of a matrix factorization, akin to the singular value decomposition, that maximizes the exponential family log-likelihood. We propose a new formulation of generalized PCA which extends Pearson's mean squared error optimality motivation for PCA to members of the exponential family. In contrast to the existing approach of matrix factorizations for exponential family data, our generalized PCA provides low-rank estimates of the natural parameters by projecting the saturated model parameters. Due to this difference, the number of parameters does not grow with the number of observations and the principal component scores on new data can be computed with simple matrix multiplication.

When the data are binary, we derive explicit solutions of the new generalized PCA (or logistic PCA) for data matrices of special structure and provide a computationally efficient algorithm for the principal component loadings in general. We also formulate a convex relaxation of the original optimization problem, whose solution might be more effective for prediction, and derive an accelerated gradient descent algorithm. The method and algorithms for binary data are extended to other distributions, including Poisson and multinomial, and the scope of the new formulation for generalized PCA is further extended to incorporate weights, missing data, and variable normalization. These extensions enhance the utility of the proposed method for a variety of tasks such as collaborative filtering and visualization. Through simulation experiments, we compare our formulation of generalized PCA to ordinary PCA and the previous formulation to demonstrate its benefits on both binary and count datasets. In addition, two datasets are analyzed. In the binary medical diagnoses data, we show that the new logistic PCA is better able to explain and predict the probabilities than standard PCA, and is able to do so with many fewer parameters than the previous formulation. On a dataset consisting of users' song listening counts, we show that generalized PCA gives better visualization of the loadings than standard PCA and improves the prediction accuracy in a recommendation task.
Yoonkyung Lee (Advisor)
Vincent Vu (Committee Member)
Yunzhang Zhu (Committee Chair)
116 p.

Recommended Citations

Hide/Show APA Citation

Landgraf, A. (2015). Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters. (Electronic Thesis or Dissertation). Retrieved from https://etd.ohiolink.edu/

Hide/Show MLA Citation

Landgraf, Andrew. "Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters." Electronic Thesis or Dissertation. Ohio State University, 2015. OhioLINK Electronic Theses and Dissertations Center. 24 Jul 2017.

Hide/Show Chicago Citation

Landgraf, Andrew "Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters." Electronic Thesis or Dissertation. Ohio State University, 2015. https://etd.ohiolink.edu/

Files

Thesis_final_submitted.pdf (768.67 KB) View|Download