Improving the accuracy and efficiency of result retrieval in peer-to-peer networks

Posted on:2010-12-15

Degree:Ph.D

Type:Dissertation

University:Illinois Institute of Technology

Candidate:Nguyen, Linh T

Full Text:PDF

GTID:1448390002983491

Subject:Computer Science

Abstract/Summary:

Searching techniques of centralized search engines are not directly applicable to peer-to-peer (P2P) networks due to the dynamic and distributed characteristics of peers and shared data. As searching in P2P networks involves data transfer among peers, it is important to maintain the accuracy of search results without sacrificing the network bandwidth.;We propose methods to improve each of the three stages of searching in P2P networks, namely result ranking, result retrieval, and query routing.;Result ranking in P2P networks is ineffective because results are poorly described by a few keywords. We develop a metadata probing technique that allows us to enrich the metadata of the returned results. Our technique is based on the fact that peers' contents overlap, and each peer independently describes its own shared content (i.e., the same file might have different description on different peers). As more metadata from other peers are retrieved, result ranking performance is improved by up to 15%.;Result retrieval in P2P networks suffers from the word mismatch problem: if a short description of a relevant file does not contain all query keywords, it will not be returned to the querying peer. To allow more relevant results to be returned, we relax the requirement of returning only results that contain all query keywords by replacing the query by one of its sub-queries (i.e., masking out one or more query keywords). Our query masking technique improves ranking performance up to 40%.;Query routing in P2P networks aims to forward queries only to peers that have the answers. Poorly described peer contents make query routing inefficient. We propose to enhance peer content summaries, without affecting the scalability of the network. Our method is based on partitioning the shared contents into a number of groups, and representing each group separately. We develop a theoretical cost model and a cost based distance function to guide the partitioning process. Our technique eliminates up to 94% of erroneously routed queries for unstructured P2P networks, and reduces the query processing cost of structured P2P networks by up to 73%.

Keywords/Search Tags:

Networks, P2P, Peer, Result retrieval, Query, Technique

Related items

1	Information Retrieval Based On Peer-to-Peer Computing
2	Research On Query Mechanism And Trust Model In Peer-to-Peer Networks
3	Research On Information Retrieval Ranking Optimization Methods
4	Research On The Resource Retrieval For Peer To Peer Networks
5	Query Processing In Structured Peer-to-Peer Networks
6	Query Hotspot Elimination Scheme Based On Replication In Structured Peer-to-Peer Networks
7	Study On Community-Based Information Retrieval System In Peer-to-Peer Networks
8	Resource Search Algorithm Based On P2P Network
9	Large-scale Peer-to-Peer Network Statistical Analysis, Characterization And Its Applications
10	Research On Query Processing And Result Caching In Search Engine