Under the framework of structural equation modeling (SEM), longitudinal data can be analyzed using latent growth models (LGM). An extension of the simple LGM is the multilevel latent growth model, which can be used to fit clustered data. The purpose of this study is to investigate the performance of five different missing data treatments (MDTs) for handling missingness due to longitudinal attrition in a multilevel LGM. The MDTs are: (1) listwise deletion (LD), (2) FIML, (3) EM imputation, (4) multiple imputation based on regression (MI-Reg), and (5) MI based on predictive mean matching (MI-PMM).
A Monte Carlo simulation study was conducted to explore the research questions. First, population parameter values for the model were estimated from a nationally representative sample of elementary school students. Datasets were then simulated based on a two-level LGM, with different growth trajectories (constant, decelerating, accelerating), and at varying levels of sample size (200, 500, 2000,10000). After datasets are generated, a designated proportion of data points (5%, 10%, 20%) were deleted based on different mechanism of missingness (MAR, MNAR), and the five missing data treatments were applied. Finally, the parameter estimates produced by each missing data treatment were compared to the true population parameter values and to each other, according to the four evaluation criteria: parameter estimate bias, root mean square error, length of 95% confidence intervals (CI), and coverage rate of 95% CIs.
Among the five MDTs studied, FIML is the only MDT that yields satisfactory bias level as well as coverage rate for all parameters across all sample sizes, attrition rates, and growth trajectories under MAR. It is also the only MDT that consistently outperforms the conventional MDT, LD, in every aspect, especially when missingness ratio increases. Under MNAR, however, estimates of the predictor effects on slopes become biased and coverage for those two parameters becomes unacceptable.
Under MAR, LD produces acceptable bias levels for most of the parameters except for the predictor effects. However, LD tends to generate wider CIs, and when a high missingness proportion is combined with small sample size, or when missingness is MNAR, the amount of bias generally increases, and CI coverage deteriorates.
This study found that EM imputation does not perform well under either MAR or MNAR. On average, EM tends to underestimate standard errors unless the sample size is very large. Less than half of all parameters have intervals with satisfactory coverage levels using EM imputation, and coverage for variance components is generally low.
Similar to EM, MI-Reg also fails to produce satisfactory bias level for certain slope-related parameters and level-2 measurement error even under MAR. Contrary to EM, MI-Reg tends to overestimate standard errors. Coverage is generally superior using MI-Reg than using EM. Multiple imputation based on predictive mean matching (MI-PMM) performs similarly to MI-Reg, though it tends to yield higher bias and lower coverage for certain parameters. |