Font Size: a A A

Study On Hierarchical Attention Network Model Based On Reinforcement Learning And Text Sentiment Classification

Posted on:2019-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2428330566486428Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
With the recent rapid growth of social media and ecommerce platforms,it has been popular for users around the world to express their opinions and feelings through the Internet.A huge amount of unstructured data has been created every day.Analyzing unstructured text data with Natural Language Processing(NLP)techniques and identifying sentiment tendencies from the text can provide powerful support for the management of social public opinion,the after-sale feedback of customer and the decision making of other business.Therefore,research on text sentiment classification has very important social significance and great business value.In Natural Language Processing,the traditional way of processing stop words is by applying a stop words list maintained artificially,but it is unreasonable and hard to find one specific stop words list that can be applied to any context.In addition,text sentiment classification can be carried out at different levels such as document level,sentence level,or phrase level.In the sentiment classification task at the document level,a hierarchical attention network model based on deep learning has achieved better classification accuracy over the past few years.However,it has some limitations.For example,Gated Recurrent Unit(GRU)model for sequence learning has a strong dependence on the text structure,while the hierarchical attention network model that is based on GRU model does not take the structure of the text into account.Therefore,two improved models are proposed in this paper.First,ID-HAN model replaces the bottom layers of HAN model to a reinforcement learning model.It learns whether a word in a sentence after word segmentation should be dropped,which allows us to scientifically extract the stop words list.Then,we develop a vector representation of the sentence by Long Short-term Memory(LSTM)using the remaining words,and we add a highway connection to the next layers to encode sentence sequence to get a vector representation of the document,so that the reinforcement learning model at the bottom can be fully trained.Compared to LSTM-BiGRU model which removes the stop words manually,IDHAN network model has better classification accuracy for the test data set.Second,The bottom layer of HS-HAN model is also a reinforcement learning model,so that it can learn the internal structure of sentences and then adjust itself.The model applies stacked LSTM layers to extract the structure of phrases from words and learn the structure of sentences from phrases.The derived vector representation of the sentences will then be used in layers with highway connection to encode sentence sequence,and then get the vector representation of the document with extracted structure information.Compared to HAN network model and Struc-att model,which are popular models for document-based sentiment classification,HS-HAN network model gives better classification accuracy for English text.
Keywords/Search Tags:Reinforcement Learning, LSTM, Hierarchical, Highway Connection, attention, Sentiment Classification
PDF Full Text Request
Related items