Doctor of Philosophy, The Ohio State University, 2023, Linguistics
Given a pair of sentences, a premise and a hypothesis, the task of natural language inference (NLI) consists of identifying whether the hypothesis is true (Entailment), false (Contradiction), or neither (Neutral), assuming that the premise is true. NLI is arguably one of the most important tasks for natural language understanding. Datasets have been collected in which pairs of sentences are annotated by multiple annotators with one of the three labels. However, it has been shown that annotation disagreement, or human label variation (Plank, 2022), is prevalent and systematic for NLI – human annotators sometimes do not give the same label for the same pair of sentences (Pavlick and Kwiatkowski, 2019, i.a.). Label variation questions the widespread assumption in natural language processing that each item has a single ground truth label and casts doubt on the validity of measuring models' ability to produce such ground truth labels.
In this dissertation, I investigate the question of why there is label variation in NLI and how to build models to capture it. First I analyze the reasons for label variation from the perspective of linguists, by developing a taxonomy of reasons for label variation. I found that NLI label variation can arise out of a wide range of reasons: some are due to uncertainty in the sentence meaning, while others are inherent to the NLI task definition. However, it is unclear how well the perspective of linguists reflect that of linguistically-uninformed annotators. Therefore, I collect annotators' explanations for the NLI labels they chose, creating the LiveNLI dataset containing ecologically valid explanations. I found that the annotators' reasons for label variation are similar to the taxonomy across the board, but some other reasons also emerged. Explanations also reveal that there exists within-label variation: annotators can choose the same label for different reasons. There is thus a wide range of variation that NLI models should capture.
(open full item for complete abstract)
Committee: Marie-Catherine de Marneffe (Advisor); Michael White (Committee Member); Chenhao Tan (Committee Member); Micha Elsner (Committee Member)
Subjects: Computer Science; Linguistics