Font Size: a A A

Improving search in unstructured peer-to-peer systems

Posted on:2009-12-21Degree:Ph.DType:Thesis
University:University of Notre DameCandidate:Acosta Negrin, William FelixFull Text:PDF
GTID:2448390002992650Subject:Computer Science
Abstract/Summary:
Peer-to-peer (P2P) systems have been studied as alternatives to centralized systems for their ability to federate the processing, storage and network resources available among a set of independently owned peers for the common good. Various distributed applications have been designed on top of a P2P infrastructure. For example, systems such as PPLive and Joost use peers for streaming videos while systems such as Gnutella and BitTorrent use peers for distributing files.;In this thesis, I focus on P2P file sharing systems. Peers in file-sharing systems form an application-level distributed overlay to manage the communication between peers. Peers then use this overlay to locate files available from other peers. The effectiveness of locating files in unstructured P2P systems depended on the ease with which other peers could be reached. Prior research efforts created overlays that had good network behavior. However, these prior research efforts did not fully investigate the nature of query resolution. In general, queries for objects were resolved by matching terms in the query to terms in the object annotations. The performance of file-sharing systems fundamentally depended not only on the ability to route queries to various peers, but also on whether the query itself could be resolved.;We performed an extensive analysis of the annotations of the files stored in the Gnutella and iTunes P2P systems as well as queries that users were issuing to locate objects in Gnutella. Despite differences in how objects were annotated between Gnutella and iTunes, the overall popularity distribution of objects was similar. I observed a fundamental mismatch between the way that objects were named and the way that they were searched in Gnutella; terms that were popular in the file annotations were not popular in the queries. This mismatch meant that many search queries were unlikely to find matching objects in the system. I showed that approximately 55% of the queries in Gnutella had no matching files. Optimizing the overlays or the search mechanisms itself was unlikely to resolve the mismatch between query terms and file annotations. These results remained consistent across my analysis of the P2P system over several years, suggesting that both the users query behavior and the way that objects were annotated had stabilized.;In this thesis, I addressed the problem of unresolved queries due to query term and object annotation mismatches by transforming the original query and selectively removing terms from the original query in order to broaden its scope. The challenge was to choose query terms without analyzing user intent or performing semantic analysis of the queries and objects. I removed query terms based on observed data from a small number of peers (< 1%) before issuing the modified query against the entire P2P system. Using captured queries and files shared by peers in Gnutella, I showed that my approach improved the maximum success rates from about 45% to over 85% of the queries. We also performed experimental analysis of the query term selection algorithm on a live Gnutella network. We showed that over 61% of the re-written queries were resolved successfully by the Gnutella network.
Keywords/Search Tags:Systems, P2P, Queries, Gnutella, Query, Peers, Search, Objects
Related items