Font Size: a A A

A Text Classification Method Based On Deep Learning And Labeled-LDA

Posted on:2018-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y M PangFull Text:PDF
GTID:2348330536976271Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text categorization has always been the basis,hotspot and difficulty of natural language processing.The model LDA achieves good results on text semantic mining as an unsupervised probability statistic model,and its extended label theme model Labeled-LDA is based on the strong supervisory information of the tag,which makes the distribution of the subject more accurate and controllable.Deep learning in recent years has been unprecedented development,neural network in various fields to show their skills.Image pixel matrix form and its own inherent spatial structure characteristics,and convolution neural network perfect combination,in the field of face recognition and other images to achieve better than human recognition effect,and with the word2 vec in natural language processing hot,text The inherent sequence of features and recursive neural network perfect combination,in the text classification,intelligent dialogue,machine translation and other fields made great progress.In the same situation,the recursive neural network is better than the convolution neural network in the short text classification.One of the reasons is that the recursive neural network is based on the sequence modeling and is consistent with the inherent sequence characteristics of the text itself.However,in the long text classification,the recurrent neural network can't converge due to the gradient dispersion,and there is a defect that the memory length is not enough,even if LSTM is unable to judge the long text classification problem.The text has both spatial structure and similar text,and has the characteristics of sequence structure.In order to keep the advantages of recursive neural network sequence modeling on the text classification problem and the advantage of convolution neural network for spatial structure modeling,this paper uses the convolution recurrent neural network CNN_RNN.For the long text,this paper applies the multi-convolution layer recurrent neural network MCNN_RNN.The experimental results show that Labeled-LDA has a good text clustering function,which can make the feature selection and can well classify the distribution of the word.This paper introduces the classification information of the word,and proposes a convolution recurrent neural network CNN_RNN_LLDA.In this paper,a convolutional recurrent neural network MCNN_RNN_LLDA is proposed.In the data set with rich samples,the former application of very deep convolution network to achieve very good results,the last two years of hot attention model in natural language processing is also fruitful,in view of this,this paper presents the Deep Residual Bidirectional Attention Network RES_BATT_LLDA.The experimental results show that the category distribution information of the introduced word,the neural network can fit the data well,and correctly guide the word vector to the text class direction mapping.In this paper,based on the deep learning and Labeled-LDA text classification method in a number of public data on the experiment,covering Chinese English,short text long text and long text,two categories and multi-class,small data sets and large data sets And the polarity of the emotional classification,the experimental results show that the method from the text of the inherent spatial structure and sequence characteristics,in the seven public data sets,have achieved the best results so far.
Keywords/Search Tags:Text Classification, Labeled-LDA, Word2vec, Deep Residual Network, Attention Model
PDF Full Text Request
Related items