Font Size: a A A

The Research Of Text Classification Based On Deep Learning

Posted on:2019-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:D D PangFull Text:PDF
GTID:2428330545990162Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of information explosion,it has become a key issue in the field of natural language processing that how to efficiently obtain valuable information from massive texts,it also promote the development of text classification technology.Text classification is mainly divided into three modules:preprocessing,feature extraction and classification recognition.But after pre-processed,the text representation is the primary key point of text classification technology,and it is also the basis of text classification.Traditional text representation methods,usually using words as the basic unit of text,it not only easily cause the loss of semantic information,but also easily lead to high dimensionality and sparseness of text features.At present,the application of text categorization technology is mostly based on statistical learning or machine learning methods.However,in the face of feature-rich text data,the generalization ability of text classifiers based on traditional methods is easily reduced.For the moment,the deep learning technology because of its unique network structure,can be abstracted at each layer and abstracted layer by layer.It can solve the problems faced by current text classification.In this paper,the sentiment classification method proposed is based on the tensor space model to quantify the text data.At the same time,it integrates the LSTM neural network which based on the STM model and proposes the L-STM algorithm model.The vector sequence is used as the input of the LSTM network to carry out higher levels.Optimization,in order to reduce the number of iterations when solving the optimal solution of the parameters.The experimental results show that the tensor space which based on L-STM model,not only can effectively solve the text data overfit problem,but also reduces the runtime of the text classifier.The L-STM model based on the tensor space in the text classifier which is Compared with current mainstream emotion classification methods,has higher accuracy of classification.The mixed model is a five-layer structure which based on deep neural network and applied to text classification,the first two layers as a sparse automatic encoding machine,and the word vectors which are processed by Word2Vec are used as the original input,and the initial feature extraction is performed;the extracted features are taken as The middle two layers of the deep confidence network input,for a higher level of extraction;then the softmax classifier for text classification.This article is based on the number of hidden layer nodes and the number of fine-tuning.The effects of classification performance were tested and compared with B-CNN?DBN,improved SVM or SVM classifiers under the same data set.The results show that the text classifier based on S-DBN model has better classification performance and accuracy.higher.Promoted the development of text classification technology.Finally,a hybrid classification system is designed and implemented which is based on the above two classification models.
Keywords/Search Tags:text classification, sentiment classification, word vector, sparse automatic encoder, deep belief-network
PDF Full Text Request
Related items