Font Size: a A A

The Research And Implementation Of The Key Technologies On Federated Search Systems

Posted on:2016-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z M ChenFull Text:PDF
GTID:2308330479493909Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Federated search is one of the most important research areas of information retrieval. Although many general search engines help users find the needed information, the widely existing of the uncrawlable deep web limits the user to obtain valid information. The federated search technology, can effectively solve this problem.Federated search is the technique and method to simultaneously query multiple document resources. This paper focuses on the key technologies of federated search systems, such as resource description, resource selection, vertical selection and results merging. The major contributions of this paper are as follows:Firstly, for resource description, in order to verify the reliability and validity of the query-based sampling in Chinese corpus dataset, this paper crawls Sohu Web data and establishes the Chinese corpus dataset, which sampled by query-based sampling and thus verifies the reliability and validity of the query-based sampling in Chinese corpus dataset.Secondly, for resource selection, in order to apply the existing resource selection method in the federated search systems with multiple web search engines, this paper proposes four different strategies to solve the problem of inaccurately estimating the size of document resource and the lack of enough resource candidate to select. Experimental results show that the proposed strategies can be effectively applied in federated search environment with multiple web search engines.Thirdly, for vertical selection, on the basis of the resource selection, this paper proposes a vertical selecion method based on some rules. The method takes a vertical as a single resource and uses rule-based resource selection method to select verticals. Experimental results show that this method greatly improves the precision and recall rates compared with conventional vertical selection methods.Fourthly, for results merging, this paper proposes a basic framework for results merging based on the vertical characteristic of the resources, which mainly solves the issues of score normalization of the returned web pages, the resources and the verticals. Based on this framework, this paper proposes two algorithms for results merging. Experimental results show that the proposed two algorithms improve more than 23% in terms of the accuracy of the search results when compared with existing methods, which also have a good performance for vertical diversificationOn the basis of the above study and according to the features of the distributed search engine platform called SE6, this paper designs the modules of resource selection and results merging. The results of the running system show again that the proposed resource selection and results merging methods can effectively improve the search precision rate.
Keywords/Search Tags:federated search systems, resource description, resource selection, vertical selection, results merging
PDF Full Text Request
Related items