Font Size: a A A

Text Retrieval In P2P Networks Based On Small World Model

Posted on:2009-02-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q W ShiFull Text:PDF
GTID:1118360272985625Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Comparing to the centralized search engine, text retrieval systems built on top of P2P networks have innate superiority in scalability, data update, maintenance cost, safety, and so forth. Due to lack of globe knowledge of networks topology, how to locate data in P2P networks and reduce communication overhead become the major issues in text retrieval under the P2P environment. In this paper, problems of text retrieval in P2P networks are studied based on the small world model and the main contributions are as follows.For the problem of high-dimensional, sparse documentation matrix, this paper presents a dimension reduction method to documents in nodes of P2P networks. In this method, words which appear more than twice are picked up to represent the content of the nodes, and then improved STC algorithm is employed to build a hierarchical structure with the selected words. In the calculation of document vector similarity, words at different levels in the hierarchical structure are given different weights with the sigmoid function.For the flooding problem in Gnutella, a text retrieval method is proposed in unstructured P2P networks based on small world models. Each node in the P2P networks maintains several long-link neighbours and short-link neighbours to build a small world P2P networks. Neighbour update is processed during the query and response. And after each query, weights of the words in documents of neighbour nodes will be revised. This makes it rapidly to find out the networks topology and contents of other nodes. Experiment results show that small world P2P networks yields higher recall in comparison with Gnutella and takes on the characteristics of small world with greater clustering coefficient and lower average path length.For the problems of complex queries, load balance and routing efficiency in structured P2P networks based on DHT, a protocol in structured P2P networks called SPPSW is designed with Kleinberg small world model. In SPPSW, nodes are clustered with the similarity of nodes contents. Nodes in the same cluster can select neighbours according to the similarity, and finally the network is made up of linked cluster. Node clusters could adjust its size by dividing into smaller one or merging into larger one. With SPPSW, routing path length is shortened by the long-link between node clusters. Experiment results show that search cost is increased with the curve of the square of logarithm and maintenance cost is linearly increased as the scale of networks goes up. And also the maintenance cost and search cost will be minimized with proper cluster size.
Keywords/Search Tags:peer-to-peer networks (P2P), the small world phenomenon, text retrieval, overlay networks
PDF Full Text Request
Related items