Font Size: a A A

Research On Affecting Factors Of Word2vec Training Optimization

Posted on:2019-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:N WangFull Text:PDF
GTID:2428330545951165Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
In 2013,Google introduced word vector processing tool word2 vec,which is used to convert natural language information into vector information.It uniquely uses the low-dimensional space to represent the core idea of massive vocabulary not only to solve the sparseness and dimensional disaster caused by the use of one-hot encoder,but also to tap the potential links between words,so as to improve the accuracy of language understanding from the perspective of meaning.Word2 vec trains word vectors by mapping the words in the text into a new space and expressing them in a multidimensional continuous real number vector.In using word2 vec to train word vectors,there are many factors that can affect the quality of word vectors.This paper takes the time-consuming of generating word vectors and the accuracy of the text classification model of embedded word vectors as the criteria for judging quality of word vector.By conducting a factorial experiment and orthogonal experiment,the corpus,model,training algorithm and vector dimension are studied.The influence of the four factors on these two indicators of merit was chosen to provide a certain reference for factoring out the word2 vec training word vector,trying to find out the best factor level combination,and optimizing the word vector generated by word2 vec.This article through two experiments,found the following phenomenon: the text to be processed is the same as or similar to the training text of the participating words as much as possible.If it cannot be satisfied,the breadth of the training corpus needs to be expanded;and dealing with news texts containing a large number of rare words,the combination of the Skip-gram model and the Hierarchical Softmax algorithm performed better.
Keywords/Search Tags:Word2vec, Word vector, Orthogonal experimental, Factor optimization, Text Categorization
PDF Full Text Request
Related items