Doctor of Philosophy (Ph.D.), Bowling Green State University, 2021, Statistics
High-dimensional data are widely encountered in a great variety of areas such as bioinformatics, medicine, marketing, and finance over the past few decades. The curse of high-dimensionality presents a challenge in both methodological and computational aspects. Many traditional statistical modeling techniques perform well for low-dimensional data, but their performance begin to deteriorate when being extended to high-dimensional data. Among all modeling techniques, variable selection plays a fundamental role in high-dimensional data modeling.
To deal with the high-dimensionality problem, a large amount of variable selection approaches based on regularization have been developed, including but not limited to LASSO (Tibshirani, 1996), SCAD (Fan and Li, 2001), Dantzig selector (Candes and Tao, 2007). However, as the dimensionality getting higher and higher, those regularization approaches may not perform well due to the simultaneous challenges in computational expediency, statistical accuracy, and algorithm stability (Fan et al., 2009). To address those challenges, a series of feature screening procedures have been proposed. Sure independence screening (SIS) is a well-known procedure for variable selection in linear models with high and ultrahigh dimensional data based on the Pearson correlation (Fan and Lv, 2008). Yet, the original SIS procedure mainly focused on linear models with the continuous response variable. Fan and Song (2010) also extended this method to generalized linear models by ranking the maximum marginal likelihood estimator (MMLE) or maximum marginal likelihood itself. In this dissertation, we consider extending the SIS procedure to high-dimensional generalized linear models with binary response variable.
We propose a two-stage feature screening procedure for generalized linear models with a binary response based on point-biserial correlation. The point-biserial correlation is an estimate of the correlation between one continuous variable and (open full item for complete abstract)
Committee: Junfeng Shang (Committee Chair); Emily Freeman Brown (Committee Member); Hanfeng Chen (Committee Member); Wei Ning (Committee Member)
Subjects: Statistics