Font Size: a A A

Query Expansion And Cluster Based Distributed Information Retrieval

Posted on:2010-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:L HeFull Text:PDF
GTID:2178360302460547Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the number of web pages fast grow, it is difficult for traditional centralized search to keep up efficient retrieval. Thus, federated search becomes a more important direction in information retrieval. This paper first introduces general researches of distributed information retrieval (DIR). Then query expansion, source selection and results merging are described as the most important sub-processes in DIR. Finally, we analyze shortcomings about universal methods on each child DIR procedure and propose an improved way, as it follows.(1) Query expansion techniques in DIR are designed to avoid interest drift by providing richer descriptions of user queries. Thereby, this paper presents a cluster-based collection correlative model for query expansion in DIR. In the process of constructing expansion model, in order to solve problem that local relevant documents are too little, we add documents in the same cluster to establish different query expansion models for each collection based on local query expansion strategy.(2) Resource selection is one of the key important problems in DIR. However, current collection selection algorithms can not be better justified by probability theory, and also never use the important topic information in every resource. Therefore, we introduce a cluster-based language model for resource selection. Firstly, relevance model is used to construct collection selection model. Secondly, the whole resource is clustered into several clusters, each of which is used to enhance the precision of that model. The experiment result shows that our approach consistently improves retrieval performance over CORI and CRCS.(3) Result merging, which is the last step of distributed information retrieval, has a direct impact on the final ranking of search results. In this paper, we introduce a hybrid result merging strategy for topic-based distributed information retrieval. This method overcomes defects of weighted scores merging way. Results in our approach from each cluster are merged by fitting a cluster-based logical regression model, which takes collection selection score, ranking and RSV information into consideration. The experimental result shows that our approach effectively improves retrieval performance over traditional merging methods, which usually rely on document relevance scores returned from remote collections.
Keywords/Search Tags:Distributed information retrieval, Query expansion, resource selection, Result merging
PDF Full Text Request
Related items