Font Size: a A A

Improving the accuracy and efficiency of result retrieval in peer-to-peer networks

Posted on:2010-12-15Degree:Ph.DType:Dissertation
University:Illinois Institute of TechnologyCandidate:Nguyen, Linh TFull Text:PDF
GTID:1448390002983491Subject:Computer Science
Abstract/Summary:
Searching techniques of centralized search engines are not directly applicable to peer-to-peer (P2P) networks due to the dynamic and distributed characteristics of peers and shared data. As searching in P2P networks involves data transfer among peers, it is important to maintain the accuracy of search results without sacrificing the network bandwidth.;We propose methods to improve each of the three stages of searching in P2P networks, namely result ranking, result retrieval, and query routing.;Result ranking in P2P networks is ineffective because results are poorly described by a few keywords. We develop a metadata probing technique that allows us to enrich the metadata of the returned results. Our technique is based on the fact that peers' contents overlap, and each peer independently describes its own shared content (i.e., the same file might have different description on different peers). As more metadata from other peers are retrieved, result ranking performance is improved by up to 15%.;Result retrieval in P2P networks suffers from the word mismatch problem: if a short description of a relevant file does not contain all query keywords, it will not be returned to the querying peer. To allow more relevant results to be returned, we relax the requirement of returning only results that contain all query keywords by replacing the query by one of its sub-queries (i.e., masking out one or more query keywords). Our query masking technique improves ranking performance up to 40%.;Query routing in P2P networks aims to forward queries only to peers that have the answers. Poorly described peer contents make query routing inefficient. We propose to enhance peer content summaries, without affecting the scalability of the network. Our method is based on partitioning the shared contents into a number of groups, and representing each group separately. We develop a theoretical cost model and a cost based distance function to guide the partitioning process. Our technique eliminates up to 94% of erroneously routed queries for unstructured P2P networks, and reduces the query processing cost of structured P2P networks by up to 73%.
Keywords/Search Tags:Networks, P2P, Peer, Result retrieval, Query, Technique
Related items