Font Size: a A A

Text Retrieval Based On Real-time Twitter Streaming

Posted on:2019-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:J J XiongFull Text:PDF
GTID:2428330548455553Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information retrieval(IR)[1]is the activity of obtaining information resources relevant to an information need from a collection of information resources.Searches can be based on full-text or other content-based indexing.Automated information retrieval systems are used to reduce what has been called "information overload".As social network[2]becomes a part of daily life,some of the major social platforms such as Facebook,Twitter,Weibo,etc.,will generate tons of textual information.With social media platforms such as Twitter garnering the interest of researchers in real-time text processing as well as in social sciences,the identification of text relevance is an important field of study.At present,it has become a hot topic in short text information retrieval filed.TREC,an International Conference on information retrieval and evaluation,added this track to its evaluation task in 2015.The core topic of this paper is that the users of social platform give the retrieval target(including the key words of the theme,the description of the theme and the expected retrieval description),then the system will retrieve the theme related information in the social media stream in real time.Aiming at this problem,we carried out research and practiced on two methods based on similarity and deep reinforcement learning respectively.In the approach of similarity measure,we adopted different methods by applying various factors,i.e.,count,cosine and distance,to measure relevance between a tweet and a given topic.By setting static threshold for models,we selected the most relevant tweet.In the approach of deep reinforcement learning,this issue is viewed as a sequential decision problem,we use the method to make decisions for the current text,that is "skip" or "pick out".The main works of this paper are as follows:(1)A real-time text retrieval framework based on Twitter stream was designed,then the function and implementation of the modules in this framework were elaborated.(2)For the core text matching model in the framework,a method based on similarity metrics and static threshold are implemented,then the effectiveness is verified through experiments.(3)The text matching model based on deep reinforcement learning is improved.Convolutional neural network(CNN)is used as the basic framework of the strategy network.The optimal decision model of similar texts is trained using Double-DQN and Dueling-DQN.
Keywords/Search Tags:Information Retrieval, Similarity, Deep reinforcement learning, Real-time text stream, Text matching model, CNN
PDF Full Text Request
Related items