PhD, University of Cincinnati, 2016, Medicine: Biostatistics (Environmental Health)
Particulate matter (PM) has long been known to have a negative effect on public health. Epidemiological studies associating air pollution and other sources of PM often rely on land use modeling for exposure assessment. This approach relies on the association of characteristics of the surrounding land with PM concentrations. Land use regression (LUR) is the most commonly implemented land use model and has several drawbacks, including model instability due to correlated predictors and an inability to capture non-linear relationships and complex interactions. Here, I utilize the machine learning random forest model within a land use framework to generate a novel land use random forest (LURF) model. Using ambient air sampling data from the Cincinnati Childhood Allergy and Air Pollution (CCAAPS) study, I developed LURF and LUR models for eleven elemental components of particulate matter, including Al, Cu, Fe, K, Mn, Ni, Pb, S, Si, V, Zn. We show that LURF models utilized a higher number and more diverse selection of land use predictors than the LUR models. Furthermore, the LURF models were more accurate and precise predictors of all elemental PM concentrations, except for Fe, Mn, and Ni.
To extend the usability of the LURF models, I utilized the recent application of the infinitesimal jackknife (IJ) to the random forest model in order to estimate the prediction variance. The IJ theorems were originally verified under the assumptions of traditional random forest framework, namely using CART trees and bootstrap resampling. Alternatives to the traditional random forest, such as subsampling instead of bootstrap resampling and conditional inference trees instead of CART trees have been shown to increase the accuracy of the random forest algorithm and eliminate its variable selection bias. Here, I conduct simulation experiments to show that the IJ performs well when using these random forest variations. Specifically, using the conditional inference tree instead of the C (open full item for complete abstract)
Committee: Patrick Ryan Ph.D. (Committee Chair); Roman A. Jandarov PH.D. (Committee Member); Marepalli Rao Ph.D. (Committee Member)
Subjects: Biostatistics