Comparative Adjudication of Noisy and Subjective Data Annotation Disagreements for Deep Learning

Williams, Scott David

Keyword Search

School Logo

Williams thesis.pdf (332.23 KB)

Comparative Adjudication of Noisy and Subjective Data Annotation Disagreements for Deep Learning

Author Info

Williams, Scott David

ORCID® Identifier

http://orcid.org/0000-0003-3332-4485

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=wright1682700129672253

Year and Degree

2023, Master of Science (MS), Wright State University, Computer Science.

Abstract

Obtaining accurate inferences from deep neural networks is difficult when models are trained on instances with conflicting labels. Algorithmic recognition of online hate speech illustrates this. No human annotator is perfectly reliable, so multiple annotators evaluate and label online posts in a corpus. Labeling scheme limitations, differences in annotators' beliefs, and limits to annotators' honesty and carefulness cause some labels to disagree. Consequently, decisive and accurate inferences become less likely. Some practical applications such as social research can tolerate some indecisiveness. However, an online platform using an indecisive classifier for automated content moderation could create more problems than it solves. Disagreements can be addressed in training by using the label a majority of annotators assigned (majority vote), training only with unanimously annotated cases (clean filtering), and representing training labels as probabilities (soft labeling). This study shows clean filtering occasionally outperforming majority voting, and soft labeling outperforming both.

Committee

Krishnaprasad Thirunarayan, Ph.D. (Advisor)
Shu Schiller, Ph.D. (Committee Member)
Michael Raymer, Ph.D. (Committee Member)

Pages

58 p.

Subject Headings

Computer Science

Keywords

natural language processing; classification; annotations; deep learning; tweets; content moderation; hate speech; MMHS150K

Williams, S. D. (2023). Comparative Adjudication of Noisy and Subjective Data Annotation Disagreements for Deep Learning [Master's thesis, Wright State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=wright1682700129672253
APA Style (7th edition)
Williams, Scott. Comparative Adjudication of Noisy and Subjective Data Annotation Disagreements for Deep Learning. 2023. Wright State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=wright1682700129672253.
MLA Style (8th edition)
Williams, Scott. "Comparative Adjudication of Noisy and Subjective Data Annotation Disagreements for Deep Learning." Master's thesis, Wright State University, 2023. http://rave.ohiolink.edu/etdc/view?acc_num=wright1682700129672253
Chicago Manual of Style (17th edition)

Document number:

wright1682700129672253

Download Count:

135

Copyright Info

Comparative Adjudication of Noisy and Subjective Data Annotation Disagreements for Deep Learning by Scott David Williams is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by Wright State University and OhioLINK.

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Comparative Adjudication of Noisy and Subjective Data Annotation Disagreements for Deep Learning

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Comparative Adjudication of Noisy and Subjective Data Annotation Disagreements for Deep Learning

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations