Active relevance feedback algorithms

Posted on:2009-09-07

Degree:Ph.D

Type:Dissertation

University:University of California, Santa Cruz

Candidate:Xu, Zuobing

Full Text:PDF

GTID:1448390005956420

Subject:Engineering

Abstract/Summary:

An active relevance feedback system actively selects retrieved documents for user relevance evaluation, and modifies the search results based on the user evaluation. The most challenging problems in active relevance feedback include the following: the active selection of the most informative feedback document set, and the effective incorporation of user evaluation in the new search.;In order to overcome the above challenges, we need to do the following: use a robust learning algorithm that can work reasonably well and avoid overfitting when the number of feedback documents is small; design a learning algorithm that can effectively learn from negative feedback documents when initial retrieved documents contain significant non-relevant documents; explore the retrieved documents to select the most informative feedback documents set. This dissertation uses the Bayesian modeling approach as a unified framework for active relevance feedback. We develop a set of solutions that enable us to build an active relevance feedback system with the desired characteristics in a principled way. We evaluate and justify these solutions on several large and diverse sets of standard ad hoc information retrieval test collections.;In the dissertation, we first design an effective active learning algorithm that explicitly considers the relevance, diversity and density of a candidate feedback document belonging to a document set under consideration. We also give an intuitive explanation on how these key factors influence the active learning performance.;Second, we designed an active relevance feedback algorithm based on the online Bayesian logistic regression model to reduce overfitting caused by limited feedback documents. The new model projects the original feature space to a more compact set which preserves the most important information concerning relevance. The new model also actively selects feedback documents based on a variance reduction scheme.;Third, we solve the long-standing model estimation problem in probabilistic ranking models. We apply the Dirichlet compound multinomial (DCM) distribution as the generative sources, since the DCM distribution is able to account for the dependency between word repetitive occurrences. We also design several parameter estimation techniques to estimate the parameters of the DCM distribution. Additionally, we propose a pseudo-relevance feedback algorithm based on the latent mixture modeling of the DCM distribution to further improve retrieval accuracy.;Fourth, we solve a difficult IR research problem: how do we learn from negative feedback documents effectively for difficult queries? We propose a new relevance feedback algorithm, based on a mixture model of the DCM distribution, to effectively utilize the information from both the positive and negative feedback documents by modeling the overlap between the positive and negative feedback documents. Consequently, the new algorithm improves the retrieval performance substantially for difficult queries. To further reduce human effort in relevance evaluation, we propose a new active learning algorithm in conjunction with the new relevance feedback model we have already developed, and thus enhance efficiency. The new active learning algorithm implicitly models the diversity, density and relevance of the unlabeled data in a transductive experimental design framework.

Keywords/Search Tags:

Relevance, Active, Algorithm, Documents, DCM distribution, Model, Evaluation, New

Related items

1	Massive query expansion for relevance feedback
2	Research Based On An Active Relevance Feedback Mechanism In Content-Based Image Retrieval
3	Research On Summary Generation Methods For Chinese-Vietnamese News Documents
4	Research On Classification Of Chinese Documents Based On Vector Space Model
5	Research On Relevance Judgement Model Of Scientific Data Users
6	The Study, Based On The Evaluation Of The Probability Models Celebrity Page
7	Relevance feedback in content-based retrieval: A study of practical aspects of implementation and evaluation
8	Image Segmentation Algorithm And Optimization Based On Active Contour Model
9	Research On Commodity Evaluation Model Based On The Helpfulness Of Online Reviews
10	Application Research Of Partial Differential Equation In Image Segmentation