Font Size: a A A

Text Representation And Classification Based On Deep Reinforcement Learning

Posted on:2020-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:T T WangFull Text:PDF
GTID:2428330578457212Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Text classification is the core of text mining and plays an important role in the fields of spam detection,news topic division,and information retrieval.The key to improving text classification performance is to establish an effective text representation and classification model.Current text representation and classification models will manually delete some useless words or use artificially constructed parsing trees to divide phrase structure when extracting text features.It does not have the ability to learn autonomously.In recent years,progress has been made in the use of deep learning for text representation and classification.Combining the self-determination ability of reinforcement learning with the perceptual ability of deep learning,we combine deep reinforcement learning on the text representation and classification model of deep learning,witch can learn autonomously to extract task-related words and divide phrase structure.The text representation and classification from word level and phrase level are studied respectively.The research in this paper is as follows:(1)A word-level text representation and classification ABLCNN-Word model based on deep reinforcement learning is designed.The ABLCNN-Word model uses the policy network of the deep reinforcement learning policy gradient algorithm autonomous learning to extract the words related to the classification task in the sentence,determine the deletion or retention of the word,and use the bidirectional recurrent neural network to learn the forward and backward word order information on the extracted sentence.The word order information is input into the convolutional neural network for convolution operation to obtain the text feature representation,and then classified by Softmax.Experiments show that the accuracy of the ABLCNN-Word model on the MPQA,CR,MR,Subj,and TREC datasets is 2.00%,2.79%,0.55%,0.36%,and 2.80%higher than the ABLCNN model without deep reinforcement learning respectively.(2)A phrase-level text representation and classification DBLCNN-Phrase model based on deep reinforcement learning is designed.The DBLCNN-Phrase model uses the policy network to predict the position of the phrase in which the word is located,and to autonomously divide the phrase structure in the sentence,The phrase structure refers to the substructure with intrinsic association in the sentence.The word order information of the word layer and the phrase layer is respectively represented in the double-layer bidirectional recurrent neural network,and the word order information of the phrase layer is used by the convolutional network to get further text representation,and then using Softmax for text classification.Experiments show that the accuracy of the DBLCNN-Phrase model on the MPQA,CR,MR,Subj,and TREC datasets is 1.57%,1.22%,1.20%,1.14%,and 2.00%higher than the DBLCNN model without deep reinforcement learning respectively.On the MPQA,CR,MR,and Subj datasets,the accuracy of the phrase-level DBLCNN-Phrase model is 2.5%?0.2%?0.6%and 0.9%higher than the word-level ABLCNN-Word model,indicating that the DBLCNN-Phrase model is richer in text representation after considering the phrase structure,and performs better in text representation and classification tasks.On this basis,the paper also compares the ABLCNN-Word model and the DBLCNN-Phrase model with existing models such as ACNN(BiLSTM)model and AdaSent model et al.Experiments show that the ABLCNN-Word model and DBLCNN-Phrase model has the highest accuracy in the datasets MPQA and CR.The accuracy of my model is also improved to varying degrees on the other three datasets.
Keywords/Search Tags:Text Representation, Text Classification, Deep Learning, Deep Reinforcement Learning
PDF Full Text Request
Related items