Query Expansion And Cluster Based Distributed Information Retrieval

Posted on:2010-10-18

Degree:Master

Type:Thesis

Country:China

Candidate:L He

Full Text:PDF

GTID:2178360302460547

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As the number of web pages fast grow, it is difficult for traditional centralized search to keep up efficient retrieval. Thus, federated search becomes a more important direction in information retrieval. This paper first introduces general researches of distributed information retrieval (DIR). Then query expansion, source selection and results merging are described as the most important sub-processes in DIR. Finally, we analyze shortcomings about universal methods on each child DIR procedure and propose an improved way, as it follows.(1) Query expansion techniques in DIR are designed to avoid interest drift by providing richer descriptions of user queries. Thereby, this paper presents a cluster-based collection correlative model for query expansion in DIR. In the process of constructing expansion model, in order to solve problem that local relevant documents are too little, we add documents in the same cluster to establish different query expansion models for each collection based on local query expansion strategy.(2) Resource selection is one of the key important problems in DIR. However, current collection selection algorithms can not be better justified by probability theory, and also never use the important topic information in every resource. Therefore, we introduce a cluster-based language model for resource selection. Firstly, relevance model is used to construct collection selection model. Secondly, the whole resource is clustered into several clusters, each of which is used to enhance the precision of that model. The experiment result shows that our approach consistently improves retrieval performance over CORI and CRCS.(3) Result merging, which is the last step of distributed information retrieval, has a direct impact on the final ranking of search results. In this paper, we introduce a hybrid result merging strategy for topic-based distributed information retrieval. This method overcomes defects of weighted scores merging way. Results in our approach from each cluster are merged by fitting a cluster-based logical regression model, which takes collection selection score, ranking and RSV information into consideration. The experimental result shows that our approach effectively improves retrieval performance over traditional merging methods, which usually rely on document relevance scores returned from remote collections.

Keywords/Search Tags:

Distributed information retrieval, Query expansion, resource selection, Result merging

PDF Full Text Request

Related items

1	Improving Resource Selection And Result Merging In An Uncooperative Search Environment
2	Research On P2P Search Technology In Uncooperative Environments
3	The Distribution Scheduling And Result Merging Of Distributed Search Engine System
4	Research On Retrieval Algorithms For Non-cooperative Distributed Search Engines With Multiple Verticals
5	Research On Learning Algorithms For Resource Selection And Results Merging In Distributed Information Retrieval
6	Research And Implementation Of Result Merging In Distributed Search
7	Research On Information Retrieval Ranking Optimization Methods
8	Information Retrieval Collection Selection Method Based On Distributed Representation And Local Sorting
9	Information Retrieval System Based On Document Query
10	Graph-Knowledge-Bases Based Collection Selection For Distributed Information Retrieval