Font Size: a A A

Text Classification Research Based On Improved LSTM And Ensemble Algorithm

Posted on:2020-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z G ZhaiFull Text:PDF
GTID:2428330602453904Subject:Engineering
Abstract/Summary:PDF Full Text Request
Text classification technology has been widely used in content review,advertising filtering,emotion analysis,text tagging and false information identification and other fields.The core of this technique lies in text feature representation,compared with the traditional text representation method based on bag of words model,word embedding model of deep learning can not only overcome the problem of "dimensional disaster" of text features,but also dig out features that cannot be found by domain experts.Therefore,text classification based on deep learning has become the hot topics in the field of natural language processing.Long Short Term Memory Networks is the mainstream deep learning model for text classification.But the current text classification methods based on LSTM have problems of information redundancy and gradient disappearance,which affect the effect of text classification.In order to solve the above problems,attention mechanism is introduced into LSTM in this paper,and a calculation method of attention probability is proposed for the weight distribution of word vector hidden state.Based on this,a new text feature representation is generated for classification,and Attention LSTM model and Attention Bi-LSTM model are proposed.By improving the influence of important words in the text and reducing the influence of other words,the improved model preserves the effective information of the text and improves the classification effect.In this paper,improved models are applied to four public corpus,including Sohu News,Legal Documents,Reuters News,and IMDB Film Reviews.The experimental results show that the Attention LSTM model and the Attention Bi-LSTM model are both improved to certain extent compared with the original model in classification.The feasibility of models is proved.In addition,this paper proposes a solution to the problem that the voting model ignores the classification performance of basic models according to the idea that multiple models are better than a single model in the field of ensemble learning.In this paper,Attention Bi-LSTM,KNN,Naive Bayes,and Support Vector Machine are integrated to form the basic model layer,CART Decision Tree is adopted as the higher-order classification model to constitute Bagging layer,and Soft Voting algorithm is adopted to constitute the voting layer.Then,a Hierarchical Ensemble Classification model composed of basic model layer,Bagging layer,and voting layer is proposed.The model learns the prediction results of the basic model by training the higher-order classification model so as to reduce the error of the model prediction.In this paper,Hierarchical Ensemble classification model,Voting model and basic model are applied to four public corpus,including Sohu News,Legal Documents,Reuters News and IMDB Film Review.The experimental results show that the Hierarchical Ensemble classification model in this paper is more effective than the best basic model called Attention Bi-LSTM in classification.
Keywords/Search Tags:Text Classification, Attention Mechanism, Long Short Term Memory Networks, Deep Learning, Ensemble Algorithm
PDF Full Text Request
Related items