A linear mixed model is a useful technique to explain observations by regarding them as realizations of random variables, especially when repeated measurements are made to statistical units, such as longitudinal data. However, in practice, there are often too many potential factors considered affecting the observations, while actually, they are not. Therefore, statisticians have been trying to select significant factors out of all the potential factors, where we call the process model selection. Among those approaches for linear mixed model selection, penalized methods have been developed profoundly over the last several decades.
In this dissertation, to solve the overfitting problem in most penalized methods and improve the selection accuracy, we mainly focus on a penalized approach via cross-validation. Unlike the existing methods using the whole data to fit and select models, we split the fitting process and selection into two stages. More specifically, an adaptive lasso penalized function is customized in the first stage and marginal BIC criterion is used in the second stage. We consider that the main advantage of our approach is to reduce the dependency between models construction and evaluation.
Because of the complex structure of mixed models, we adopt a modified Cholesky decomposition to reparameterize the model, which in turn significantly reduces the dimension of the penalized function. Additionally, since random effects are missing, there is no closed form for the maximizer of the penalized function, thus we implement EM algorithm to obtain a full inference of parameters. Furthermore, due to the computation limit and moderately small samples in practice, some noisy factors may still remain in the model, which is particularly obvious for fixed effects. To eliminate the noisy factors, a likelihood ratio test is employed to screen the fixed effects. Regarding the overall process, we call it adaptive lasso via cross-validation.
Additionally, we demonstrate that the proposed approach possesses selection and estimation consistency simultaneously. Moreover, simulation studies and real data examples are both provided to justify the method validity.
At the very end, a brief conclusion is drawn and some possible further improvements are discussed.