Doctor of Philosophy, The Ohio State University, 2013, EDU Teaching and Learning
Although multiple-choice assessment formats are commonly utilized throughout the educational hierarchy, they are only capable of measuring a small subset of important disciplinary competencies and practices. Consequently, science educators require open-response format assessments that can validly measure more advanced skills and performances (e.g., producing written scientific explanations). However, open-response format assessments are not practical in many educational contexts because of the high cost of scoring, the delayed feedback to test-takers, and the lack of scoring consistency among human graders. In this study, the efficacy of automated computer scoring (ACS) of written explanations is examined relative to human scoring. This study aims to build ACS models using machine-learning methods in order to detect a suite of scientific and naive ideas in written scientific explanations, and to explore approaches for optimizing these ACS models. This study develops and evaluates nine machine-learning models to detect six scientific concepts and three naive ideas of natural selection. In addition, it examines the effects of three machine-learning parameters (i.e., n-gram selection, stop words, and misclassified data) on the performance of the ACS models. In order to test the efficacy of the ACS models, a corpus of 10,270 written evolutionary explanations--in response to a variety of items differing in surface features was gathered. The corpus was scored by expert human raters and by the ACS models, and four correspondence measures were calculated: kappa, raw agreement, precision, and recall. Methodologically, the ACS models were built using the SMO (Sequential Minimal Optimization) algorithm in the LightSIDE software. Repeated-measures ANOVAs, Pearson correlations, and logarithmic regressions were used to examine the effects of the three machine learning parameters on human-computer correspondence measures, and to examine the effects of sample size on model performa (open full item for complete abstract)
Committee: Ross H. Nehm PhD (Advisor); David L. Haury PhD (Committee Member); Lin Ding PhD (Committee Member)
Subjects: Educational Evaluation; Educational Technology; Science Education