Font Size: a A A

Active relevance feedback algorithms

Posted on:2009-09-07Degree:Ph.DType:Dissertation
University:University of California, Santa CruzCandidate:Xu, ZuobingFull Text:PDF
GTID:1448390005956420Subject:Engineering
Abstract/Summary:
An active relevance feedback system actively selects retrieved documents for user relevance evaluation, and modifies the search results based on the user evaluation. The most challenging problems in active relevance feedback include the following: the active selection of the most informative feedback document set, and the effective incorporation of user evaluation in the new search.;In order to overcome the above challenges, we need to do the following: use a robust learning algorithm that can work reasonably well and avoid overfitting when the number of feedback documents is small; design a learning algorithm that can effectively learn from negative feedback documents when initial retrieved documents contain significant non-relevant documents; explore the retrieved documents to select the most informative feedback documents set. This dissertation uses the Bayesian modeling approach as a unified framework for active relevance feedback. We develop a set of solutions that enable us to build an active relevance feedback system with the desired characteristics in a principled way. We evaluate and justify these solutions on several large and diverse sets of standard ad hoc information retrieval test collections.;In the dissertation, we first design an effective active learning algorithm that explicitly considers the relevance, diversity and density of a candidate feedback document belonging to a document set under consideration. We also give an intuitive explanation on how these key factors influence the active learning performance.;Second, we designed an active relevance feedback algorithm based on the online Bayesian logistic regression model to reduce overfitting caused by limited feedback documents. The new model projects the original feature space to a more compact set which preserves the most important information concerning relevance. The new model also actively selects feedback documents based on a variance reduction scheme.;Third, we solve the long-standing model estimation problem in probabilistic ranking models. We apply the Dirichlet compound multinomial (DCM) distribution as the generative sources, since the DCM distribution is able to account for the dependency between word repetitive occurrences. We also design several parameter estimation techniques to estimate the parameters of the DCM distribution. Additionally, we propose a pseudo-relevance feedback algorithm based on the latent mixture modeling of the DCM distribution to further improve retrieval accuracy.;Fourth, we solve a difficult IR research problem: how do we learn from negative feedback documents effectively for difficult queries? We propose a new relevance feedback algorithm, based on a mixture model of the DCM distribution, to effectively utilize the information from both the positive and negative feedback documents by modeling the overlap between the positive and negative feedback documents. Consequently, the new algorithm improves the retrieval performance substantially for difficult queries. To further reduce human effort in relevance evaluation, we propose a new active learning algorithm in conjunction with the new relevance feedback model we have already developed, and thus enhance efficiency. The new active learning algorithm implicitly models the diversity, density and relevance of the unlabeled data in a transductive experimental design framework.
Keywords/Search Tags:Relevance, Active, Algorithm, Documents, DCM distribution, Model, Evaluation, New
Related items