Font Size: a A A

Research On The Model Of Word Embedding Based On Word2Vec

Posted on:2019-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhengFull Text:PDF
GTID:2428330572952544Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The Word2 Vec model is a neural network model that converts words into word vectors,and is widely used in emotional analysis,machine question and answer and other Natural Language Processing fields.The word vector generated by the Word2 Vec model lacks the polysemy of context and the problem that can not create the unlogged word(OOV)word vector.This paper,based on the similarity information of the document context and the Word2 Vec model,proposes a word vector generation model which conforms to the context meaning of the OOV.It is called the Word2Vec-ACV model.First,the words in the document are stored in the co occurrence matrix in the form of vectors.Second,the co occurrence matrix is normalized to get the average context word vector,Then it is made up of an average context word vector matrix.Final,the vector matrix of the average context word vector matrix is multiplied by the weight matrix trained by the Word2 Vec model based on the continuous word bag(CBOW)and the Hierarchical Softmax to get the Word2Vec-ACV word vector.In this paper,the average context word vectors were divided into two kinds: the global average context word vector(Global ACV)and the local average context word vector(Local ACV).In addition,the two taken the weight value to form a new average context word vector matrix.The Word2 Vec model can effectively express the word in vector form.Experiments on analogical tasks and named entity recognition(NER)tasks respectively,the results show that the Word2Vec-ACV model is superior to the Word2 Vec model in the accurate expression of the word vector.It is a word vector representation method to create a contextual context for OOV words.
Keywords/Search Tags:Word2Vec, Word Vector, OOV, The co-occurrence matrix, ACV
PDF Full Text Request
Related items