Font Size: a A A

Research And Application Of Diverse Ranking In Information Retrieval

Posted on:2012-04-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:G L LinFull Text:PDF
GTID:1488303356993029Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, the Web has become the largest information base in the world. Web information retrieval systems (Search engines) provide a lot of convenience for Web search. However, as the research of Information Retrieval goes further, many researchers have recognized the need of diversity in search result ranking, which can't be satisfied by traditional ranking models. They introduce a new hot research direction in Information Retrieval ---- diverse ranking.To satisfy the need of diversity, search engines have to uncover the intents underlying users'queries, and provide a diverse result list for the users with different intent. How to maximize users'satisfaction by ranking the retrieved documents according to users'queries when the set of underlying intent is unknown is the key of diverse ranking problem. The existing works can be divided into two categories: implicit diversification and explicit diversification. Implicit diversification approaches usually provide diverse ranking by modeling the difference of facet covering between documents based on certain assumptions, while explicit diversification approaches improve diversity by directly modeling the set of underlying intent.This paper focuses on the research and application of diverse ranking in Information Retrieval, and tries solving the diverse ranking problem from the two aforementioned angles. Firstly, it proposes two diversification methods for solving the redundancy problem of existing implicit diversification methods, based on document similarity comparison and information space covering respectively. Secondly, it proposes a new explicit diversification method which estimates the set of underlying intent from both angles of system and user. Finally, based on the above works, it implements a diverse ranking system.The main contributions of this paper include:1) It proposes a diversification method named DATAR based on absorbing Markov random walks. In particular, it tries to solve diverse ranking problem based on document similarity comparison. To solve the redundancy problem of an existing diversification method named Grasshopper, it proposes a new strategy for document similarity comparison, which can better uncover the difference of facet covering between documents due to the characteristic of time to absorption of states in absorbing Markov chain. Experiment results show that DATAR outperforms Grasshopper.2) It formulates the diverse ranking problem as document set utility maximization problem. Then it gives an analysis on the NP-hardness of the problem, and a proof for the submodularity of the objective function. Moreover, it proposes a framework for diversification methods based on document similarity comparison, and gives an analysis of the effectiveness of the framework. These works improve the theoretical research of the diverse ranking problem.3) It proposes a new diversification prototype KED based on keywords. Specifically, it tries solving the diverse ranking problem by information space covering. It utilizes extracted keywords to represent the information elements related to a query, and it is the first diversification method which models the distance between the keywords. Experiment results show that KED can stably outperforms several existing implicit diversification methods, and its extracted keywords contribute to its effectiveness.4) It proposes an online diversification algorithm named CRBA based on document clustering and user clicks. It's the first explicit diversification method which estimates the set of underlying intent from both system and user angles. It firstly gets a rough estimation of the set of underlying intent by document clustering, and then achieves better estimation by the interaction with users. It can dynamically adjust the ranking positions of documents for better satisfying users with different intent. The effectiveness of the algorithm is validated by experiments. Moreover, under certain conditions, CRBA has a worst-case lower bound.5) It accomplishes the design and implementation of a diverse ranking system. The system, which makes use of the great power of document retrieval of the existing search engines, can provide relevance but diverse search results.
Keywords/Search Tags:Information Retrieval, Result Diversification, Document Ranking
PDF Full Text Request
Related items