Font Size: a A A

Research On P2P Search Technology In Uncooperative Environments

Posted on:2011-10-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z J RenFull Text:PDF
GTID:1118330332478364Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
To address the scalability problems faced by current centralized search engines, re-searchers propose a solution, i.e., federated search in P2P networks, as an alternative to centralized search engine. In the past few years, a number of systems for federated search in P2P networks have been developed. Typically, a P2P-based federated search engine consists of many distributed and autonomous peers. Each peer contains a collection of documents, and can answer queries based on its local index.Previous works assume that all peers are "cooperative" in the sense that they always publish accurate and detailed resource descriptions about their own document repositories. However, this assumption is not always valid in real applications. For example, despite its document-querying service, a hidden Web site such as a digital library or news archive may not be able to release accurate description information about its own archive. If we want to integrate these databases as part of the search system, it seems to be unpractical to obtain resource descriptions from them. Such peers, which can answer queries but do not provide any resource descriptions about their own repositories, are often referred to as uncooperative peers in the literature. As for the massive quantity and coverage of such uncooperative peers in the Web, we would like to highlight that the problem of incorporating uncooperative peers into a P2P search system is potentially significant and challenging.The main contributions of our work can be summarized as follows:We present a practical framework called PISA to support federated search in a struc-tured P2P network containing uncooperative peers. The employment of structured P2P network aims to achieve high search efficiency. The distributed data struc-ture in PISA can index the resources stored in both cooperative and uncooperative peers, and provide efficient search service. Compared to conventional structured P2P search systems, our system is able to effectively utilize the search service pro- vided by uncooperative peers.We propose a novel heuristic query-based sampling (HQBS) technique, to acquire the resource descriptions of uncooperative peers. PISA employs this sampling technique to generate the distributed index directory.We devise an approach named OPS (Overlap-aware Peer Selection) for improving the peer selection by considering overlap among uncooperative peers in P2P-based federated search. This method derives peer-specific and query-specific coverage statistics of uncooperative collections from past queries results and use the statis-tics to estimate the novelty of each peer. To the best of our knowledge, no such approaches have not previously been explored in the literature of P2P systems.For the result merging and ranking problem, we introduce two effective methods called RISE and RISE+ to merge query results returned by uncooperative peers into a single ranked list. The proposed methods are able to handle "uncooperative" results which do not contain relevance scores.We addresses the problem of the index directory maintenance in a dynamic envi-ronment, in which peers may join or leave the system at any time and the content in each peer may evolve continually. To guarantee accurate search results, the index directory needs to be updated to reflect those changes. In view of this, we present an efficient method, namely CSU, for keeping index directory up-to-date at a low cost..
Keywords/Search Tags:P2P Network, Uncooperative Environments, Information Retrieval, Distributed Information Retrieval, Resource Description, Resource Selection, Result Merging, Index Directory Maintenance
PDF Full Text Request
Related items