Font Size: a A A

Research On Time Series Twitter Stream Text Classification Based On Deep Reinforcement Learning

Posted on:2019-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y B WangFull Text:PDF
GTID:2428330566477990Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Twitter is a typical representative of social networks.In general,compared with the traditional social media,such as magazines,new newspapers,and online blogs,there are a lot of unique features such as larger volume,shorter texts,higher real-time performance,wider coverage and more noise in tweets.These characteristics have brought great trouble to the efficient retrieval of tweet information,and most users often do not have time to browse through each piece of information.How to perform timely noise reduction and extract valuable information on tweets is becoming a hot topic in the field of Natural Language Processing.This thesis proposed an algorithm named DQN-TC(Deep Q-Network for Text Classification)that based on deep reinforcement learning for text classification of Twitter streams.This algorithm is a fusion of deep neural network and reinforcement learning.Reinforcement learning has a very strong ability of autonomous learning,in which the agent obtains the maximum expected feedback value from the environment by trying different actions according to the current state,and then executes the action to update the current state.In the course of experience replay of reinforcement learning,the model takes the input at the current moment as the state of the agent.This state is the vector representation of the tweets in this article,and then the tweets vector at the next moment is expressed as the observed state.The other innovation of the model is that the use of a deep neural network architecture consisting of recurrent neural network and fully connected layer as the approximate function of action value function.Among them,the recurrent neural network uses the time series information and semantic information contained in the text of the tweet stream as the input of the network,and then generates a high-dimensional abstract representation of the input sequence,and finally outputs the Q value of the corresponding action through the full connected layer.It is used to estimate the action value function in reinforcement learning in order to determine which action to take next,that is to say,whether to filter the tweet text.In this thesis,we crawl the real TREC 2016 real-time summary dataset.After a thorough cleaning of the original tweets crawled,a vectorial representation of the text is generated.Then a series of experiments were carried out to evaluate the effectiveness of the proposed model,from the simple common angle cosine similarity calculation to the analysis of the SVM based classification algorithm.It was proved that the conventional machine learning algorithm extracts text features to train the model can not achieve the desired results.Then based on the time series information of the stream text of Twitter,this paper adopts the algorithm based on LSTM model and obtains better results than the previous two models.Finally,on the basis of several algorithms,this thesis uses the deep Q-Network(DQN)algorithm to obtain convincing experimental results.Finally,the effectiveness of the proposed algorithm DQN-TC is verified.
Keywords/Search Tags:Twitter, Deep Reinforcement Learning, DQN, Time series, DQN-TC
PDF Full Text Request
Related items