Font Size: a A A

Research On Micro-blog Retrieval Based On Query Extension

Posted on:2019-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:H Y XuFull Text:PDF
GTID:2348330542989091Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
As an Internet social software,micro-blog is becoming more and more important.It also has the properties of social networking sites and the attributes of mass media.It is a new kind of media and network platform.People can post short essays with no more than 140 characters on their micro-blog.With daily production of hundreds of millions of content,micro-blog data is becoming very large.How to find the content of users'interests more accurately from this vast amount of information becomes an important task.There is a big difference between the search on micro-blog and the search for traditional web pages.The query is simple,the text is short,and it has a very strongtime feature,and it's a feature of the micro-blog retrieval,and it's a little bit of a study on how to improve the results of the Twitter retrieval.In the first,this paper introduces the related technologies for the extension of the short text query,based on global query extension and local query extension.TThen,a new kind of query extension based on semantic similarity and time distribution is proposed.In the selection of the extension words based on the semantic similarity,the method of using word activation forces to pseudo-correlation feedback was adopted,and the score of the text was added to the calculation weight,which made the extended word more relevant to the original query.In order to reduce the noise disturbance,the extended words are selected by pseudo correlation feedback based on the word time distribution similarity.Finally,the two parts are combined.The extended words were reordered according to the overall correlation fluctuation range.This kind of combination can expand the query and reduce the noise interference.Then,the second retrieval is performed after the extension of the query word is completed.The retrieved results are returned to the user by sorting.Micro-blogs are too short to match the query,this paper presents a method to add query word information and document time information in the second retrieval calculation.This is an improvement on the resequencing algorithm,making the document more relevant to the user's retrieval requirement.Finally,the comparison test is designed,and the results of the experiment show that the query extension method of this paper is improved by combining the time and content query expansion,The accuracy of retrieval was improved in the retrieval after query expansion.By adding the document time information and query word information into the reorder rating,the relevancy of the retrieval results is improved,which is helpful to improve the satisfaction of retrieval effect.
Keywords/Search Tags:Short Text, Microblog retrieval, Query Expansion, Correlation Feedback, Resort
PDF Full Text Request
Related items