Font Size: a A A

Research On Query Expansion Of Twitter Data Information

Posted on:2018-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:L Y MaFull Text:PDF
GTID:2348330515497059Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,social networks such as Twitter plays an increasingly important role in people's lives.It generates a lot of Twitter data at all times in the world.It is very important to filter the information that satisfies the user's needs in this large amount of data.The query expansion method is widely used in the tweet retrieval,which can solve this problem effectively.The query expansion mainly consists of two parts: one is to select the tweets associated with the original query word as the corpus,the other is to filter the corpus and the original query is the most relevant words to be extended words,the traditional method mainly uses BM25 algorithm,VSM algorithm and TF-IDF algorithm and so on the original query and tweets to compare the correlation,screening out the user needs to meet the tweet as a corpus.There are two shortcomings in this approach: one is that the tweet containing fewer query words is missed,and the second is that the irrelevant tweets with more query words are mistakenly screened.In view of this problem,this paper studies and innovates in the following aspects:(1)The thesis proposes the query expansion method based on tweet clustering,designs and completes it.This method improves the process of screening the text as a corpus.This method optimizes the method of comparing the traditional one by one with the original query word.This method first classifies the tweets and filters the tweets of the good class according to the correlation with the original query words.The resulting set of tweets contains all the tweets of the same semantics.And then compare the type of push text and the relevance of the original query,the best to meet the needs of users to promote the class.This method is a good solution to the problem that contains the less frequently asked words.Compared with the BM25 algorithm,the two methods are improved by 11.4%and 13.9%,which are 14.9% and 15.3% higher than the VSM algorithm,which are higher than the TF-IDF algorithm.15.8% and 13.7%.(2)The thesis proposes based on the theme of the query expansion method.This method improves the problem of the subject offset in the irrelevant tweets with more query words,so that the irrelevant tweets containing the query words are effectively filtered.This method divides the tweet into the subject,and selects the set of tweetsunder the subject of the user query as the corpus,effectively removing the tweet containing the query term but not the subject.Compared with the BM25 algorithm,the two methods are improved by 16.2%and 13.9%,which are 16.7% and 17.3% higher than the VSM algorithm,which are higher than the TF-IDF algorithm.17.7% and 15.6%.(3)The thesis tests the application of the topic division method and the push text clustering method in the query expansion respectively.In this thesis,the advantages and disadvantages of the two methods are analyzed,and two methods are used to improve the retrieval index.
Keywords/Search Tags:query expansion, tweets feedback, tweets clustering, tweets retrieval
PDF Full Text Request
Related items