Doctor of Philosophy, The Ohio State University, 2020, Computer Science and Engineering
We introduce a virtual patient question-answering dialogue system, used for training medical students to interview real patients, which presents many unique opportunities for research in linguistics, speech, and dialogue. Among the most challenging research topics at this point in the system's development are issues relating to scarcity of training data. We address three main problems.
The first challenge is that many questions are very rarely asked of the virtual patient, which leaves little data to learn adequate models of these questions. We validate one approach to this problem, which is to combine a statistical question classification model with a rule-based system, by deploying it in an experiment with live users. Additional work further improves rare question performance by utilizing a recurrent neural network model with a multi-headed self-attention mechanism. We contribute an analysis of the reasons for this improved performance, highlighting specialization and overlapping concerns in independent components of the model.
Another data scarcity problem for the virtual patient project is the challenge of adequately characterizing questions that are deemed out-of-scope. By definition, these types of questions are infinite, so this problem is particularly challenging. We contribute a characterization of the problem as it manifests in our domain, as well as a baseline approach to handling the issue, and an analysis of the corresponding improvement in performance.
Finally, we contribute a method for improving performance of domain-specific tasks such as ours, which use off-the-shelf speech recognition as inputs, when no in-domain speech data is available. This method augments text training data for the downstream task with inferred phonetic representations, to make the downstream task tolerant of speech recognition errors. We also see performance improvements from sampling simulated errors to replace the text inputs during training. Future enhancements to (open full item for complete abstract)
Committee: Eric Fosler-Lussier PhD (Advisor); Michael White PhD (Committee Member); Yu Su PhD (Committee Member)
Subjects: Artificial Intelligence; Computer Science; Educational Software; Linguistics