Improving search in unstructured peer-to-peer systems

Posted on:2009-12-21

Degree:Ph.D

Type:Thesis

University:University of Notre Dame

Candidate:Acosta Negrin, William Felix

Full Text:PDF

GTID:2448390002992650

Subject:Computer Science

Abstract/Summary:

Peer-to-peer (P2P) systems have been studied as alternatives to centralized systems for their ability to federate the processing, storage and network resources available among a set of independently owned peers for the common good. Various distributed applications have been designed on top of a P2P infrastructure. For example, systems such as PPLive and Joost use peers for streaming videos while systems such as Gnutella and BitTorrent use peers for distributing files.;In this thesis, I focus on P2P file sharing systems. Peers in file-sharing systems form an application-level distributed overlay to manage the communication between peers. Peers then use this overlay to locate files available from other peers. The effectiveness of locating files in unstructured P2P systems depended on the ease with which other peers could be reached. Prior research efforts created overlays that had good network behavior. However, these prior research efforts did not fully investigate the nature of query resolution. In general, queries for objects were resolved by matching terms in the query to terms in the object annotations. The performance of file-sharing systems fundamentally depended not only on the ability to route queries to various peers, but also on whether the query itself could be resolved.;We performed an extensive analysis of the annotations of the files stored in the Gnutella and iTunes P2P systems as well as queries that users were issuing to locate objects in Gnutella. Despite differences in how objects were annotated between Gnutella and iTunes, the overall popularity distribution of objects was similar. I observed a fundamental mismatch between the way that objects were named and the way that they were searched in Gnutella; terms that were popular in the file annotations were not popular in the queries. This mismatch meant that many search queries were unlikely to find matching objects in the system. I showed that approximately 55% of the queries in Gnutella had no matching files. Optimizing the overlays or the search mechanisms itself was unlikely to resolve the mismatch between query terms and file annotations. These results remained consistent across my analysis of the P2P system over several years, suggesting that both the users query behavior and the way that objects were annotated had stabilized.;In this thesis, I addressed the problem of unresolved queries due to query term and object annotation mismatches by transforming the original query and selectively removing terms from the original query in order to broaden its scope. The challenge was to choose query terms without analyzing user intent or performing semantic analysis of the queries and objects. I removed query terms based on observed data from a small number of peers (< 1%) before issuing the modified query against the entire P2P system. Using captured queries and files shared by peers in Gnutella, I showed that my approach improved the maximum success rates from about 45% to over 85% of the queries. We also performed experimental analysis of the query term selection algorithm on a live Gnutella network. We showed that over 61% of the re-written queries were resolved successfully by the Gnutella network.

Keywords/Search Tags:

Systems, P2P, Queries, Gnutella, Query, Peers, Search, Objects

Related items

1	The Research On Tree-Structure Search Mechanism In Gnutella
2	On understanding and classifying web queries
3	Research On Processing Of Nearest Neighbor And Reverse Nearest Neighbor Queries Over Moving Objects In Road Networks
4	The Storage And Query Processing Of Multi-attribute Moving Objects
5	Data Management Of Indoor Moving Objects
6	Research On Algorithms Of Aggregate Queries In Spatio-temporal Databases
7	Valid Routing Research Based On P2P Network
8	The Research Of Key Techniques Of Uncertain TOP-K Query Processing
9	Research On Data Queries And Processing Techniques In Moving Objects Databases
10	Index-based, Encrypted Database Query