Font Size: a A A

Research On Methods Of Micro-blog's Authority For Information Retrieval

Posted on:2018-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y H WeiFull Text:PDF
GTID:2348330518482375Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of wireless network technology and the further popularization of intelligent mobile devices, micro-blog, which is one of the most popular communication paradigms nowadays, has gradually become an important platform for people to share information and discuss hot topics in real time. Faced with such a large amount of information data on micro-blog, the users are more desired to get useful information. Micro-blog retrieval is an effective way to obtain information,which has aroused widespread concern in the field of information retrieval.Meanwhile,there are several problems in micro-blog retrieval,such as entity search,sentiment analysis, and modeling abstractions such as authority and quality.Micro-blog texts are made by a large amount of different people for a specific topic,different users show a great difference in terms of authority, Integrating authority into the ranking process of the micro-blog text can improve the performance of information retrieval. The main tasks of this thesis are as followings:First , the not exist of Chinese micro-blog information retrieval test collection limit the development of Chinese micro-blog information retrieval. The construction of the information retrieval test collection is a very difficult task and requires a lot of manpower. This paper make use of the micro-blog tag to construct the micro-blog retrieval test collection, which can greatly reduce the workload of manual annotation while guaranteeing the quality. The test collection is composed of three parts: the document corpus, the query topic- set and the relevance judgment set. In the process of building a test collection, we use a document downloaded from Tencent micro-blog data, and and extract the micro-blog data to set labels, thus to determine a query containing 52 queries theme set through the label corresponding to the relevant documents and retrieval results. Retrieving each query theme and its relative document,and thus generating effect of relevance criteria which focus on judging retrieval.Second,two kinds of calculation methods about authority on microblog are put forward. One is based on the number of reposted times to calculate the authority of micro-blog. By digging the rich semantic information in the micro-blog text and combining with the user behavior, the quality of the micro-blog being directly related to the user behavior is found. Among them, the reposted behavior in the micro-blog has the function of communication, which indicates that the recognition on micro-blog text.In the process of calculating the authority of micro-blog, taking the reposted number of micro-blog as a prior probability of the document according to the framework of the language model information retrieval method to calculate their authoritative score and thus make the initial ranking in micro-blog come true . The another is based on PostRank algorithm . According to the initial sort of micro-blog based on the forward relationship in the initial order of micro-blogs, constructing the reposted relationship diagram, and make use of link analysis algorithm PageRank to calculate the document score, combined with the initial sort of document score, re-calculate the authoritative score of micro-blog to achieve the document rearrangement,moreover, the effectiveness of the method in the Chinese micro-blog information retrieval test collection is proved.
Keywords/Search Tags:Test collection, Language model, Authority, Link analysis
PDF Full Text Request
Related items