Font Size: a A A

Comparison And Combination Of Text Classification Based On Word2vec With SVC And AT-LSTM

Posted on:2019-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:S Q LiFull Text:PDF
GTID:2428330566460546Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Text categorization is a basic and important work in natural language processing.Text classification is the classification of natural language textual information,and the category is the language's intention.Since the advent of word2 vec,it has been widely used in all aspects of natural language processing.Based on word2 vec,this paper uses Chinese Wikipedia corpus to generate Chinese word vectors,using support vector machines and AT-LSTM as tools,experimentation of Text Classification(Sentiment Classification)for Hotel Review Data Collected during Internship,and compare the results and performance of these two algorithms.Then try to combine these two algorithms and give a better combination of classification accuracy model.Word vector generated by word2 vec as input,Support Vector Machine Needs Steps from Word Vector to Sentence Vector,This step is full of difficulties and directly determines the final classification effect.AT-LSTM avoids this problem by extracting sentence features from the encoder-decoder model framework,then classify sentences directly.In the text categorization task,the classification result of AT-LSTM is obviously better than that of SVM with the mean value of the word vector as a sentence vector.However,SVMs that generate sentence vectors in the form of “vote” word vectors are not inferior to AT-LSTM in classification results.The classification result of SVM is more dependent on the way of generating sentence vectors,and AT-LSTM can be regarded as a feature extraction method.This article combines the two,first extracts the semantic vector of the sentence through AT-LSTM,and then uses the support vector machine to classify.Finally,a better combined model than the simple support vector machine and AT-LSTM classification.
Keywords/Search Tags:Text Categorization, word2vec, Support Vector Machines, ATLSTM, encoder-decoder
PDF Full Text Request
Related items