Font Size: a A A

Text Modeling And Classification Based On Word2vec

Posted on:2017-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:G C FengFull Text:PDF
GTID:2358330503981871Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Text classification is an important research area of text mining, and also one of the key technologies on the natural language processing and machine learning. With the explosive growth of Internet text messages, automatic text classification has been used more and more widely. In recent years, machine learning methods have been applied to the field of automatic text classification. Compared with traditional text classification model, the method of machine learning has made progress in classification results and flexibility.Text modeling is the cornerstone of text classification, at present the main text modeling methods include vector space model and topic model.Among the most widely used is vector space model,but the vector space model has some shortcomings,such a high dimension and sparsity,synonyms and polysemy,etc.Compared with the vector space model, topic model can reduce dimension effectively and find the potential topic between documents,it also solve the problem of semantics between words,but topic model needs lots of samples to learn, the training process has great difficulty and costs lots of time,affects the efficiency of text classification.This paper studies on relevant text categorization technology, proposed a text mod eling method word2vec_k-means.Compared with the traditional text modeling methods,word2vec_k-means method has a significantly improve on the classification accuracy,a nd through experiments to verify the effectiveness of the method.,research work of thi s paper includes following three aspects:(1) Study on the text classification process and its related technologies.Against the common text modeling methods, analysis the advantages and disadvantages of these te xt modeling methods.(2) Get the word embedding through the deep learning model word2 vec, based on this we put forward a new text model method word2vec_k-means, through the se two text model methods, the dimension of text can be reduced effectively, it also s olves the problem of semantics between synonyms, and greatly reduced the training time.(3) We complete text modeled use the feature items through word2 vec,based on t his, weuse svm classification algorithm, and combine the depth learning model word2 v ec outstanding semantic representation capability and svm classification ability. The res ults show that compared with traditional text modeling, word2vec_k-means text modeli ng methods improves the F1 micro-average precision value and F1 macro-average prec ision value and the classification accuracy.
Keywords/Search Tags:word2vec, text modeling, text classification, svm
PDF Full Text Request
Related items