Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Reliable Named Entity Recognition Using Incomplete Domain-Specific Dictionaries.pdf (2.78 MB)
ETD Abstract Container
Abstract Header
Reliable Named Entity Recognition Using Incomplete Domain-Specific Dictionaries
Author Info
Sain, Joy Prakash
ORCID® Identifier
http://orcid.org/0000-0001-5605-046X
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=wright1693266174263903
Abstract Details
Year and Degree
2023, Doctor of Philosophy (PhD), Wright State University, Computer Science and Engineering PhD.
Abstract
Information Extraction (IE) techniques are essential to gleaning valuable information about entities and their relationships from unstructured text and creating a structured representation of the text for downstream Natural Language Processing (NLP) tasks including question answering, text summarization, and knowledge graph construction. Supervised Machine Learning (ML) techniques have been widely used in IE. While the resulting extraction algorithms are very effective, they require a large amount of annotated data, which can be expensive to acquire and time-consuming to create. Additionally, creating high-quality gold-standard annotations can be challenging, particularly when dealing with new domains or languages that lack sufficient resources to facilitate annotations. This dissertation develops minimally-supervised approaches to extract Named Entities (NEs) from text, specifically addressing the challenges arising from using distantly-supervised techniques for NE extraction from the text in which domain-specific dictionaries are used to automatically match and assign labels to data, which can subsequently be used to train an ML model for the extraction task. A key challenge in learning an effective ML model for distant learning techniques is the incompleteness of the dictionaries being used which can result in incomplete, partial, or noisy annotations. In case of incomplete or missing annotations, training a sequence labeling model for NER may result in suboptimal learning. To address these challenges, in this dissertation, I propose novel approaches to improve dictionary coverage that utilize a state-of-the-art phrase extraction technique and domain-specific dictionary to extract phrases from unlabeled text data. Leveraging the lexical, syntactic, and contextual features of the entities present in the initial dictionaries, I propose headword and span-based classification approaches to categorize the extracted phrases into corresponding entity classes. The span-based approach involves generating synthetic negative examples for cases of incomplete and partial annotation and training the model. Rather than consider all possible non-entity spans or randomly sampling negative spans as in existing span-based NER models, the proposed approach involves a stratified approach to negative sampling to improve the model’s effectiveness and ability to learn the entity boundaries accurately. The experimental evaluation conducted on standard benchmark datasets demonstrates that the proposed approach achieves state-of-the-art performance when compared to other baseline methods.
Committee
Michael Raymer, Ph.D. (Advisor)
Krishnaprasad Thirunarayan, Ph.D. (Advisor)
Tanvi Banerjee, Ph.D. (Committee Member)
Charese Smiley, Ph.D. (Committee Member)
Pages
111 p.
Subject Headings
Artificial Intelligence
;
Computer Science
Keywords
Information Extraction
;
Natural Language Processing
;
Named Entity Recognition
;
Domain-Specific Dictionary
;
Distant Supervision
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Sain, J. P. (2023).
Reliable Named Entity Recognition Using Incomplete Domain-Specific Dictionaries
[Doctoral dissertation, Wright State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=wright1693266174263903
APA Style (7th edition)
Sain, Joy.
Reliable Named Entity Recognition Using Incomplete Domain-Specific Dictionaries.
2023. Wright State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=wright1693266174263903.
MLA Style (8th edition)
Sain, Joy. "Reliable Named Entity Recognition Using Incomplete Domain-Specific Dictionaries." Doctoral dissertation, Wright State University, 2023. http://rave.ohiolink.edu/etdc/view?acc_num=wright1693266174263903
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
wright1693266174263903
Download Count:
246
Copyright Info
© 2023, some rights reserved.
Reliable Named Entity Recognition Using Incomplete Domain-Specific Dictionaries by Joy Prakash Sain is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by Wright State University and OhioLINK.