Font Size: a A A

The Reaserch Of The Microblog Retrieval Model Integrating User Interest And Mixed Estimation

Posted on:2020-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:X T ZhangFull Text:PDF
GTID:2428330596485193Subject:Management Science and Engineering
Abstract/Summary:
According to the 42 nd statistical report on Internet development in China released by China Internet network information center,as of June 2018,the number of microblog users in China has reached 337 million,accounting for 42.1% of the total number of netizens.The huge number of users makes microblog gradually become an important way for people to obtain and share information.In order to obtain effective information from the huge microblog data,microblog retrieval has become an important part of microblog service.Although microblog retrieval belongs to the category of text retrieval,it is not the same as traditional text retrieval.The difference between the two is mainly reflected in two aspects: ranking principle and searching data.In terms of ranking principle,in addition to the content similarity between query and document,other factors should be considered in microblog retrieval,such as time,interest,quality of blog posts,etc.In terms of search data,microblog retrieval is targeted at microblog documents,which are typically short in length and sparse in content.Combined with the characteristics of microblog retrieval,this paper proposes a microblog retrieval model that integrates user interest and mixed estimation,which is based on the query likelihood model.The new model mainly improves the document prior probability and document language model estimation in the traditional query likelihood model.The specific work is reflected in two aspects:(1)In document prior probability: firstly,by quantifying the user's interest behaviors applied to microblog for getting the user's interest blog library;then,by calculating the degree of similarity between the user interest blog library and a micblog to improve the prior probability of a microblog document;eventually,the paper makes user's interest microblog has higher prior probability,which meets the demand of the user's personalized retrieval in a certain extent.(2)In terms of estimating the document language model: first of all,content relevancy between microblogs is obtained based on the text content of microblog;then the interaction degree between weibo users is obtained by quantifying their interaction behaviors;finally,mix content relevancy and user interaction to get a set of related document sets,and use the related document sets as a smoothing term to estimate the language model of microblog,which to some extent,alleviates the influence of microblog sparsity on the estimation of microblog language model.Since the current authoritative test set is difficult to meet the experimental requirements,this paper uses the real data crawled from sina microblog to verify the effectiveness of the research content.Firstly,the original data of 661,845 microblogs were cleaned and preprocessed,and the test set of this paper was constructed according to the standard test set construction method.Then the retrieval performance of different microblog retrieval models is compared on the test set.The experimental results show that: compared with the stage work in this paper,the overall work in this paper is better in both P@k and MRR;Compared with the mainstream microblog retrieval model,the microblog retrieval model proposed in this paper is superior in both P@k and MRR indexes.
Keywords/Search Tags:Microblog retrieval, Query likelihood model, User interest, Mixed estimation, Language model
Related items