Font Size: a A A

Research On Text Classification Based On Structure Optimization Recurrent Neural Network

Posted on:2020-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:W X LinFull Text:PDF
GTID:2428330590471796Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The full development and total expansion of information resources have created an era of "information overload".Information overload produces a large amount of redundant data information,which seriously affects the effective utilization of information.The problem is no longer how to obtain information,but how to effectively select,integrate,utilize and make decisions in the face of a large amount of information.Among them,a large amount of information is mainly composed of text.The classification of text content and products has an important significance to solve information confusion.This paper systematically analyses the application scenarios of text categorization,the three waves of text categorization and natural language processing in the development process,and the current research status of deep learning methods such as recurrent neural networks in text categorization.This paper focuses on the feature representation of text and some commonly classification methods.On the basis of summarizing the current relevant achievements and methods,aiming at the shortcomings of short text feature extraction and the global representation of recurrent neural networks.The paper puts forward some relevant improvement methods and achieves effective results.The main research contents are as follows:Aiming at the characteristics of less features and limited information in short text,pooling operation destroys the characteristics of local spatial time series.The pooling operation was not utilized in the model.Serial-parallel CNN was used to extract phrase features to get local context information as the input of RNN.And Gated Recurrent Unit(GRU)was chose as basic structure of RNN to generate sentence feature based on sequence information.The additive margin was introduced into the Softmax classifier to extract the classification features with more interclass discrimination.The proposed model was applied in the text classification data set TREC,MR,and Subj simulation test.Experimental results demonstrate that the model improves the quality of text feature extraction and classification accuracy.The classification results perform well compared to the same parameter scale in GRU and CNN,G-Dropout,other common models.RNN is a biased model of which later inputs are more dominant than earlier inputs.To optimizing global representation in RNN for document modelling,the convolutional bidirectional recurrent network(CBI-RNN)is introduced to text categorization.One convolutional layer and one max pooling layer are utilized to extract phrase level local information from word embedding.The BI-LSTM with global pooling is adopted to extract the global information for selecting features which are most favorable for classification.Depending upon the global pooling scheme utilized in the model,model variants are named CBI-RNN-Max and CBI-RNN-Att.Feature concatenated layer is also introduced in the proposed model and its performance is reported in comparison with different model variations.The proposed model is applied to the text classification data set Reuters21578-R8,WebKB.Experimental results indicate that the proposed model captures more contextual information and achieves state-of-the-art performance on both datasets.
Keywords/Search Tags:convolutional neural network, recurrent neural network, text classification, global representation
PDF Full Text Request
Related items