Font Size: a A A

A Novel Model To Predict Query Performance Based On Cluster Score

Posted on:2010-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:D Z PengFull Text:PDF
GTID:2178360302459632Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, electronic information data grows explosively and searching interested information from large-scale documents becomes desirable. Information Retrieval (IR) technology emerges and becomes more and more important. However, there is serious robust problem in most current IR systems. Predict Query Performance (PQP) technology has aroused intense interests in IR community as key technology to solve such robust problem.Many experts have researched PQP and proposed many good algorithms, such as Clarity Score model, Robustness Score model. However, pre-retrieval methods cannot predict query performance very well without any information about retrieved documents. Some post-retrieval methods acquire good performance. But these methods mainly survey the geometry characteristic of retrieved documents and require a mass of calculating resourceIn this paper, we analyze the influence factor of IR system and find that queries, documents, IR model and its parameter all have strong relation with the retrieval performance. And that IR system hasn't identified all aspects of the topic is the main reason for retrieval failure.Based on"IR system hasn't identified all aspects of the topic is the main reason aroused retrieval failure"and"cluster hypothesis", we present a novel method to predict query performance for text retrieval by Cluster Score, which has gotten some ideas from Vector Space Model. In Cluster Score model, Cluster Score is used to quantify how IR system identifies each aspect in query and the similarity between each returned document simultaneously. Experiment result demonstrates that Cluster Score significantly and consistently correlates with the average precision over all test collections. Cluster Score model can predict query performance precisely.Contrast to other classic model, Cluster Score model has many improvements as follows:First, its performance is much better since it measures how IR system identifies each aspect in query and the similarity between each returned document simultaneously.Second, it cancels two unrealistic hypothesizes, one is that words in the documents are considered to be independent with each other and another one is that words in the documents are considered to be independent with the words in query.Third, it is easy to compute Cluster Score with top k returned documents.Besides solving robust problem in IR system, Cluster Score model also can be used to merge results in distributed IR system and META search system, help user compose better query, query expansion and so on.
Keywords/Search Tags:Information Retrieval, Text Retrieval, Predict Query Performance, Predict Query Difficulty, Cluster Score model
PDF Full Text Request
Related items