Font Size: a A A

A Study Of Word Vector Extraction Based On Neural Network

Posted on:2018-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y F YouFull Text:PDF
GTID:2348330542461669Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the in-depth development of computer technology,people have entered the era of large data,we can get a variety of information on the Internet every day.A lot of information such as the news of the portal site,the dynamics of microblog friends circles,interviews with people,and so on,this information is very important.The human language itself is very information,how to let the machine can deal with understanding and even analysis of this information has been the difficulty of natural language processing.Through the machine learning to deal with the massive corpus information,constructs the text model,transforms each word to correspond to the vector form,then obtains the relation between the word and the words through the mathematics relation between the different word vectors,is the most commonly used solution.Therefore,it is particularly important to obtain high-quality word vectors that correctly reflect the link between words.The most popular access to Word vector frames is the Word2vec framework released by Google in 2013.Word2vec possesses two models,respectively,through the Skip-gram model of the word prediction and the CBOW model of the Word Prediction Center,and the two models are processed by a simplified shallow neural network,which can obtain a good quality word vector efficiently.The main work of this paper is to improve the algorithm of the Word2vec frame,and propose a new framework,which is the following:(1)The Word2vec of the original frame in the projection layer causes the algorithm to be insensitive to the order of words entered by the input layer.Int this paper,we modify the Skip-gram model and CBOW model of the original frame separately,and propose two new models,and the new model is sensitive to the word order of the input.The experiment proves that the new model obtains the higher accuracy of the word vectors in the same size,and the time efficiency of the new model is still satisfactory.(2)Word2vec The original model does not support incremental operation and distributed computation.When we finished processing a corpus,we got the word vectors.If a new corpus is obtained,the word vectors will be recalculated after merging the original corpus and the new corpus in order to update the word vectors.The computation time is greatly increased.Aiming at this problem,a method of weighting weighted summation of word frequency in corpus database is designed,and then normalization is done,only the new corpus is obtained by calculating the words vectors,and the new acquired word vectors are combined with the original word vectors.The experimental results show that the new method obtains the same word vector as the original model,but the speed of computation is greatly improved,and the process of system processing word vectors can support distributed computation.
Keywords/Search Tags:word vector, word2vec, neural network, machine learning, incremental calculation
PDF Full Text Request
Related items