Font Size: a A A

Research On The Resource Retrieval For Peer To Peer Networks

Posted on:2008-08-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:J XuFull Text:PDF
GTID:1118360272966902Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid growth of Internet and computing power, peer-to-peer (P2P) systems have gained much attention from both industrial and academic fields. P2P systems share idle CPU power, free disk space and network bandwidth between different peer nodes in a distributed and equal way. Expect for centralized systems based on an index server, P2P systems can be roughly classified into two categories: unstructured P2P systems and DHT-based structured P2P systems. As for any large distributed system which is used heavily, the effectiveness of P2P systems largely depends on not only its topology structure, but also the versatility and scalability of its retrieval mechanism.The resource retrieval mechanism for P2P systems can be classified into two categories: keyword-based retrieval and fulltext-based retrieval. Resource retrieval mechanisms in unstructed P2P systems are inherently blind, which makes the search inefficient and unscalable. While structured P2P networks can provide search efficiency and scalability by deploying identifier-based retrieval mechanism, they fail to support flexible full-text retrieval just as unstructured P2P systems can do.For the research on the resource retrieval in P2P systems, there are two important research fields: the logical topology structure of network system and the placement of resource index.For the P2P system which supports the keyword-based retrieval, the Landmark scheme is proposed to group all of peer nodes into several clusters based on the physical topology of network firstly, which makes peer nodes in the same cluster have small link latency and peer nodes in the different cluster have long link latency. It can guarantee most of the routing is in the same cluster which can avoid the"reroute"in the Chord system and can reduce the time cost for the routing as well as the number of messages. Then, because it is obvious that P2P system workload has temporal and spatial localities just as that in the web traffic and users always retrieve data of a kind, which they are interested in, the resource index should be stored based on resource semantics which makes the same kind of resources placed in the same cluster. After that, a class cache table is utilized to cache the identifier of peer node where the resource of some kind searched recently stores and the identifier of this kind. If the resource of this kind is researched next, the information of cache table can be used directly. Lastly, because it has been observed that the small world phenomenon is pervasive in the network. A non-deterministic caching scheme is given to reduce maintenance cost for updating the routing cache table. And the SW cache replacement scheme with the small-world paradigm instead of the traditional LRU scheme is proposed to further improve the performance of object lookup. Both theoretical analysis and simulations show this scheme can improve the lookup performance as well as it can reduce maintenance cost under the same size of routing table.For the P2P system which supports the full-text retrieval, the placement scheme of resource index is considered firstly. A height-balanced tree structure DOC-Tree used to organize data objects in vector-format in the P2P system is proposed, which can reduce the time complex of search. The simple strategy for the placement of tree's nodes is given, which can guarantee both load balance and fault tolerance. After that, TRES-CORE searching scheme is used to reduce the search time in the distribute environment. The resource index is extracted using the vector space model technology, which will result in hundreds or thousands of dimensions in the resource vector space. So a dimension reduction technology based on the rough set is presented to improve the efficiency of search mechanism.The logical topology structure of the network system is another research field for the P2P system which can support the full-text retrieval. Firstly, a general hierarchical model for the resource retrieval of P2P systems is presented and interfaces among each level are also given. Then, a semi-structural and hybrid logical network structure (SSH) is proposed, which can obtain a good scalability for the system and a good efficiency for the search mechanism as well as can also avoid the system bottleneck and support the full-text retrieval. In the SSH model, all of peer nodes are partitioned into several peer clusters according to their physical locations. There are a super peer (SP) and several ordinary peers (OP) in each peer cluster. Super peers are organized by Distribute Hash Table (DHT) and peer nodes in each cluster are organized in the unstructured way. At last, it is known this design can get an improvement for the resource retrieval in the P2P system through the experiment results.
Keywords/Search Tags:Peer-to-Peer network, Topology, Keyword-based retrieval, Fulltext-based retrieval, Distribute hash table, Small world model, Vector space, Dimensional Reduction
PDF Full Text Request
Related items