Font Size: a A A

Research On Implicit Methods In Search Result Diversification

Posted on:2017-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WuFull Text:PDF
GTID:2308330503964116Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of science and technology, people depend on information retrieval systems more heavily. Typically, users submit their queries to get the information that they want. A traditional retrieval system returns the results based on their matching with the query. However, the extensive use of Web search engine in recent years makes people come to understand the necessity of supporting results diversification. Keyword queries have their limitations as a way of carrying a user’s information need, which is often ambiguous or has multiple meanings. Even for the same query, different users may interest on different aspects. A traditional retrieval system is unable to meet the diversified information needs of users. Therefore, it becomes more important to support results diversification. For a given query, a retrieval system should provide results with a coverage as wide as possible so as to meet possibly different requirements of users. In a new-generation information retrieval system, it is necessary to consider both relevance of the documents to the query and diversity of the documents in the results at the same time.The existing retrieval systems which support results diversification usually take a two-stage process. The first stage is the same as the traditional information retrieval systems and only relevance of the documents to the query is considered. Based on the first stage, and the second stage tries to diversify the documents by re-ranking them. Currently, there are many methods for results diversification. We can divide them into two categories: implicit methods and explicit methods. The implicit method only consider the documents themselves in retrieval results, but does not rely on any external resources for additional information. Contrarily, The explicit methods assume that more query information is available, for example by some external resources. Previous studies often focus on a particular method itself, but it is not very clear how those proposed methods perform in different situations. Moreover, there is a lack of comparative study on those proposed methods. For a retrieval system that supports results diversification, it is worthy to do a thorough investigation about which reranking algorithm to go ahead.The main work is as follows:(1) Comparative analysis of several implicit diversification algorithm is carried out. The following algorithms are included: Maximum Marginal Relevance(MMR), Kullback–Leibler Divergence(KL) algorithm, Modern Portfolio Theory(MPT), Sparse Spatial Selection Diversification(SSSD), Quantum Probability Ranking Principle(QPRP) and Max-Sum Dispersion(MSD) algorithm. Experiments are carried out to compare the performance of these implicit diversification algorithms, and a group of diversification metrics are used to evaluate the results.(2) Based on Greedy Local Search algorithm(GLSS), this thesis proposes an improved Sparse spatial selection diversification method. The sparse spatial selection diversification algorithm regards the problem as a Facility Location Analysis problem in operations research, and uses a greedy local search method to select the top-k documents to meet the needs of users. For the performance comparison of the proposed algorithm, we use several other methods including sparse spatial selection diversification method(SSSD), quantum probability ranking principle(QPRP) and quantum probability ranking principle based on greedy local search(QPRP-GLS) as baselines. Experimental results show that GLSS performs better than the others on diversification metrics.
Keywords/Search Tags:Search result diversification, implicit re-ranking, sparse spatial selection, greedy local search
PDF Full Text Request
Related items