Font Size: a A A

Research On Microblog Retrieval Based On BTM And Query Extension

Posted on:2019-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:C CaiFull Text:PDF
GTID:2428330602960390Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the further development and popularization of the Internet,Weibo has become increasingly popular as a powerful and powerful online platform and social media.At present,Twitter,which is popular all over the world,and some popular microblogs in China,such as Sina Weibo and Tencent Weibo,have a large user base and generate hundreds of millions of bytes of content every day.Since the Weibo message does not exceed the length limit of 140 characters,and the writing is random and mixed with many network terms and emoji.With the rapid growth of Weibo data,it is especially important to retrieve valuable real-time information that users need from the chaotic short text information.Traditional information retrieval technologies still have many shortcomings in solving these problems.In order to solve the above problems,based on the previous research,this thesis takes Weibo as the research object and conducts in-depth research on the related technology of microblog short text retrieval.This article combines the salient features of Weibo,the main work is as follows:(1)Propose a microblog retrieval model based on BTM and graph theoryIn order to solve the difficulties caused by the small number of characters in the microblog text,the feature sparseness,and the large amount of data,this paper proposes a feature similarity calculation based on Weibo label and BTM(bi-term topic model)model similar to ordinary text generation.The comprehensive microblog retrieval model for calculating the degree of similarity of the implicit structure behind the text;the model starts from three perspectives,firstly uses the strong feature of the hashtag-specific topic tag hashtag to effectively retrieve the relevant blog post;secondly from the ordinary text The BTM model is generated to solve the short text sparsity and lack of contextual problems.Finally,the entity relationship diagram between blog posts is mined,and the similarity relationship between microblogs is obtained.The experimental results show that the model is superior to the original model in MAP,accuracy and recall rate,and has better retrieval performance.(2)Improve the microblog query extension of frequent word setsAiming at Weibo's long blog post,articles,news,etc.,this paper constructs a microblog query expansion model that improves frequent word sets.The new model starts from the breadth of query expansion,so that the range of extension words of query words can be covered more widely.The concept of outreach relationship is proposed based on frequent word sets,and the association between words is further strengthened,and more semantic information is mined.Considering the distribution information of words in different categories,the information gain is fused to change the word weighting method,so that the category information of words in the document set can be effectively preserved,the word similarity matrix is constructed by frequent word sets,and non-negative matrix factorization technique is utilized.Extending it to a short text space better solves the problem of sparse text in Weibo text.The experimental results show that the model is superior to the original model in both Purity and F values,which proves the effectiveness of the proposed method.
Keywords/Search Tags:Microblog, Retrieval model, BTM, Graph theory, Query expansion
PDF Full Text Request
Related items