Doctor of Philosophy, The Ohio State University, 2021, Computer Science and Engineering
Natural language provides a universal and efficient way for humans to express their intent and perceive the world. This inspires a surge of natural language interface (NLI) systems, which enable humans to acquire knowledge and solve problems using solely natural language. These include question answering systems such as the early BASEBALL system and IBM Watson, as well as virtual assistants such as Amazon Alexa, Apple Siri, Google Home, and Microsoft Cortana.
Despite the remarkable progress, building NLIs that can reliably serve users in the long term has never been easy. In this dissertation, we characterize and study the three stages of the NLI life cycle: (1) Data Collection, where system developers collect training data to bootstrap the NLI system; (2) Model Development, where system developers design and implement the backend machine learning model, improving its capacity until its performance reaches the commercial grade. Note that both Data Collection and Model Development are before the system deployment. (3) User Interaction, where the NLI system is expected to interact with users and serve them reliably after its deployment. In this dissertation, we will first summarize the history and the status quo of the NLI study, as well as the challenges in each of the three stages. Following them, we will present our research achievements towards advancing NLIs in each stage.
Specifically, in Part II, we will discuss solutions to improving the first two stages of the NLI construction (i.e., before deployment). We focus on constructing NLIs to code snippets, with applications in software engineering. Collecting training data for such specialized domains is typically expensive since domain expertise is needed from annotators. To address the problem, we explore training a machine learning model to automatically extract data from domain-specific online forums (e.g., Stack Overflow). Bootstrapped with a small amount of annotations, our model is trained and applied (open full item for complete abstract)
Committee: Huan Sun (Advisor); Srinivasan Parthasarathy (Committee Member); Yu Su (Committee Member); Arnab Nandi (Committee Member); Douglass Schumacher (Committee Member)
Subjects: Computer Engineering; Computer Science