Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Using Sentence Embeddings for Word Sense Induction

Abstract Details

2020, MS, University of Cincinnati, Engineering and Applied Science: Computer Science.
One of the primary goals of the field of Natural Language Processing is to create very high-quality text embeddings which can be used in many domains. The main area which text embedding methods typically fall short is in handling polysemy detection. A word is polysemous when it has multiple meanings (e.g. the word bank when used in a financial context versus an ecological context). Current text embedding methods fail to handle this at all, training just one embedding for all meanings of a word. Discovering methods for handling polysemy detection is an active area of research. This thesis presents a Word Sense Induction (WSI) system which is based on the hypothesis that by clustering sentence embeddings it is possible to achieve a clustering over sense embeddings as well. Subsequently, this hypothesis this thesis uses the SemEval 2010 benchmark to test the Sentence based WSI (S-WSI) methodology and compare it with state-of-the- art methods in the field. This benchmark is based on four key metrics: homogeneity, completeness, precision, and recall. The key advantages of the approach proposed in this thesis compared to other methods is adaptability. This S-WSI methodology can use any sentence embedding model or clustering method making it highly adaptable to the user’s domain specific needs. This method is highly dependent on the sentence embedding model which is being used with some models achieving near SOTA performance whereas some models only performing slightly better than pure random.
Ali Minai, Ph.D. (Committee Chair)
Raj Bhatnagar, Ph.D. (Committee Member)
Anca Ralescu, Ph.D. (Committee Member)
78 p.

Recommended Citations

Citations

  • Tallo, P. T. (2020). Using Sentence Embeddings for Word Sense Induction [Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1613748873435158

    APA Style (7th edition)

  • Tallo, Philip. Using Sentence Embeddings for Word Sense Induction. 2020. University of Cincinnati, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1613748873435158.

    MLA Style (8th edition)

  • Tallo, Philip. "Using Sentence Embeddings for Word Sense Induction." Master's thesis, University of Cincinnati, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1613748873435158

    Chicago Manual of Style (17th edition)