Font Size: a A A

Research On Word Sense Disambiguation Method Based On Word Embedding

Posted on:2019-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:X W LvFull Text:PDF
GTID:2438330563957661Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Word sense disambiguation(Word Sense Disambiguation)refers to the task of automatically determining the correct meaning of an ambiguous word in a given context using certain methods or algorithms.Word sense disambiguation is a basic technology in natural language processing technology and has important influence on the performance improvement of many natural language processing technologies such as text mining,automatic summarization,machine translation,and information retrieval.Therefore,improving the performance of word sense disambiguation has also become an urgent need for people.Lesk algorithm is the earliest paraphrase overlap method,and it is also one of the classical word sense disambiguation algorithms.In recent years,many researchers have proposed Leskbased word sense disambiguation methods.However,numerous studies have neglected the influence of words around the ambiguous words on disambiguation when using contextual information.Lesk algorithm obtains the best meaning of ambiguous words by calculating the number of lexical overlaps between the context and the ambiguous terms of the ambiguous words in the dictionary.however,the meaning of ambiguous words in the dictionary is usually short,making the number of vocabulary overlap smaller or even not overlapping.To solve this problem,we solve the problem by calculating the context and the similarity of the meaning by using the vector to represent the context and meanings.The highest degree of similarity is the disambiguation result of the ambiguous word.Based on this,it is considered that the frequency of use of the individual terms in the ambiguous words is different.Therefore,this paper considers the influence of the distance between the word and the ambiguous word in the context on word sense disambiguation and the effect of the distribution frequency of each sense item on word sense disambiguation.The research content of this article is mainly composed of the following parts:This paper proposes a basic framework of word sense disambiguation based on Word2 vec.Word vector training was conducted through the Wikipedia corpus.Training-based word vectors were used to generate flat representations of context vectors and meaning vector generations.In combination with the frequency distribution of word meanings obtained based on WordNet,a comprehensive score calculation model was developed.Extended experiments based on the Senseval-3 dataset show the effectiveness of the disambiguation method in this paper.A word sense disambiguation method that combines distance weights and the frequency of meaning distribution is proposed.The word vector generated by the flat representation does not consider the effect of the distance between the word and the ambiguous word in the context on word sense disambiguation.In this paper,when considering the influence of the distance between words and ambiguous words,we further fuse the distance weights of words and ambiguous words in the context,and study the influence of the context vector generated by three weight functions based on Gaussian kernel function,Laplacian kernel function and Cauchy kernel function on the disambiguation effect.In addition,in addition to word vectors generated by Word2 vec,the effect of word vectors generated by Glove on the disambiguation effect is also studied.The experimental results show that the Gaussian kernel function performs better in the distance weights of the captured context and the Glove trained word vectors have better disambiguation effects.
Keywords/Search Tags:Word Sense Disambiguation, Word Embedding, Natural Language Processing
PDF Full Text Request
Related items