Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
36903.pdf (3.44 MB)
ETD Abstract Container
Abstract Header
Using Sentence Embeddings for Word Sense Induction
Author Info
Tallo, Philip T
ORCID® Identifier
http://orcid.org/0000-0002-7028-1229
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1613748873435158
Abstract Details
Year and Degree
2020, MS, University of Cincinnati, Engineering and Applied Science: Computer Science.
Abstract
One of the primary goals of the field of Natural Language Processing is to create very high-quality text embeddings which can be used in many domains. The main area which text embedding methods typically fall short is in handling polysemy detection. A word is polysemous when it has multiple meanings (e.g. the word bank when used in a financial context versus an ecological context). Current text embedding methods fail to handle this at all, training just one embedding for all meanings of a word. Discovering methods for handling polysemy detection is an active area of research. This thesis presents a Word Sense Induction (WSI) system which is based on the hypothesis that by clustering sentence embeddings it is possible to achieve a clustering over sense embeddings as well. Subsequently, this hypothesis this thesis uses the SemEval 2010 benchmark to test the Sentence based WSI (S-WSI) methodology and compare it with state-of-the- art methods in the field. This benchmark is based on four key metrics: homogeneity, completeness, precision, and recall. The key advantages of the approach proposed in this thesis compared to other methods is adaptability. This S-WSI methodology can use any sentence embedding model or clustering method making it highly adaptable to the user’s domain specific needs. This method is highly dependent on the sentence embedding model which is being used with some models achieving near SOTA performance whereas some models only performing slightly better than pure random.
Committee
Ali Minai, Ph.D. (Committee Chair)
Raj Bhatnagar, Ph.D. (Committee Member)
Anca Ralescu, Ph.D. (Committee Member)
Pages
78 p.
Subject Headings
Computer Science
Keywords
Natural Language Processing
;
Artificial Intelligence
;
Word Sense Induction
;
Sentence Embedding
;
Polysemy
;
Word Sense Disambiguation
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Tallo, P. T. (2020).
Using Sentence Embeddings for Word Sense Induction
[Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1613748873435158
APA Style (7th edition)
Tallo, Philip.
Using Sentence Embeddings for Word Sense Induction.
2020. University of Cincinnati, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1613748873435158.
MLA Style (8th edition)
Tallo, Philip. "Using Sentence Embeddings for Word Sense Induction." Master's thesis, University of Cincinnati, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1613748873435158
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ucin1613748873435158
Download Count:
275
Copyright Info
© 2020, all rights reserved.
This open access ETD is published by University of Cincinnati and OhioLINK.