Modeling Score Distributions for Information Retrieval

Posted on:2013-05-10

Degree:Ph.D

Type:Thesis

University:Northeastern University

Candidate:Dai, Keshi

Full Text:PDF

GTID:2458390008987352

Subject:Computer Science

Abstract/Summary:

When a user submits a query to a search engine, the search engine computes a score for each document according to its relevance to the query, and ranks the documents based on their scores. Due to the complexity of the modern search engine, the score itself is not sufficient for the information retrieval application requiring combining different ranked lists. Inferring the score distributions for relevant and non-relevant documents and estimating the probability of relevance become imperative. In this thesis, we address two major research questions: (1) How to model score distributions in a more accurate manner for relevant and non-relevant documents? (2) How can score distributions be better inferred in practice when the relevance information is absent?;In the first part of the thesis, we show the existing problems of today's most widely used score distribution model, and propose to model the relevant document scores by a mixture of Gaussian distributions and the non-relevant scores by a Gamma distribution. Score distributions are further modeled in a more systematic manner. With a basic assumption of the distribution of terms in a document, the distribution of the produced scores for retrieved documents can be derived through the transformations applied on the term frequency. Meanwhile, the score distribution of relevant documents can also be derived through a general mathematical framework given the score distribution for all retrieved documents.;The second part of the thesis presents a new framework for inferring score distributions when the relevance information is unavailable. The new inference process extends the expectation maximization algorithm by simultaneously considering the ranked lists of documents returned by multiple retrieval systems, and encodes the constraint that the same document retrieved by multiple systems should have the same, global, probability of relevance. Combined, we demonstrate that it is more effective when it is applied on the task of metasearch.

Keywords/Search Tags:

Score, Search engine, Information, Relevance, Model, Document

Related items

1	Methods Design Of Search Engine Web Page Relevance Assessment And Application In Rank Model
2	The Design And Implementation Of Chinese Personal Name Search Engine
3	Research And Implementation Of A Chinese Full-Text Information Retrieval Technology Based-on Lucene Search Engine
4	Research On Content Search Engine Based On The Topic Relevance Routing In P2P Networks
5	Current Status Research And Improved Design Of Meta Search Engine
6	Application And Research Of Document Cluster In Web Results Of Search Engine
7	Relevance Assessment (Un-)Reliability in Information Retrieval: Minimizing Negative Impact
8	Application And Research Of Web Document Clustering In Search Engine
9	The Design And Implementation Of Full Text Search Engine Based On Lucene
10	"Luder" Content Based Document Search Engine