Font Size: a A A

Research Of Data Location And Data Replication In Hierarchical P2P Systems

Posted on:2007-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:X M ZhangFull Text:PDF
GTID:2178360215970250Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data location is an important basic service in large-scale peer-to-peer systems. A path of data location on the P2P overlay network consists of a series of application-level hops, not IP-level hops. However, in most P2P systems, little effort is made to ensure that this application-level connectivity is congruent with the underlying IP-level network topology. This in turn can lead to inefficient routing. By using the location information of nodes, the latency of data location can be reduced greatly. According to the topology, peer-to-peer systems can be divided into three categories: centralized topology, purely decentralized topology and hierarchical topology. The purely decentralized topology includes unstructured topology and structured topology. Though the data location of centralized P2P systems is efficient, their centralized servers may become performance bottlenecks. Though unstructured P2P systems are flexible, the flooding mechanism of data location induces that they can't scale to large network systems. Though structured P2P systems are scalable, their flat DHT designs result in mismatch of peers between their capabilities and load. The hierarchical P2P systems can combines advantages of both centralized and decentralized P2P systems. Recently it has become a research hot that hierarchical structure is introduced to improve the performance of P2P systems. As the unreliability of peers in P2P systems, data replication algorithms are used to improve data availability. Traditional data replication algorithms have a low utilization ratio of storage space. Data replication algorithm based on erasure coding can improve the utilization ratio of storage space greatly. In this thesis, location-aware data location and data replication based on erasure coding in hierarchical P2P systems are deeply researched.To resolve the incongruity between P2P overlay network and underlying network, a distributed topology-aware overlay construction algorithm Lanet is proposed, in which the overlay network construction is based on the location information of underlying topology. Nodes that are close in the logical network are also close in terms of network latency. The expected delay in each hop of message routing is low, which leads to low latency stretch of data location. This algorithm is scalable and distributed. This algorithm can be used to construct location-aware systems that are structured or unstructured. Simulation results indicate that the latency of data location in these systems constructed by Lanet can be reduced greatly.To improve the performance of P2P systems by exploiting hierarchical topology, a construction algorithm Chpp of hierarchy-adaptive P2P topology DAHP2P and a hierarchical routing algorithm Hroute are proposed. Peers are grouped into clusters according to proximity, and super peer that is the strongest peer in each cluster is selected to provide serves for client peers. Super peers in the same tier form the up-tier overlay network, and the number of system tiers is changed self-adaptively according to the number of nodes in the system. A hierarchical routing algorithm is designed to reduce the routing hops and latency. Simulation results show that Hroute can locate data within O(l) hops(l is the number of system tiers) , and loads of superpeers in different tier are relatively balanceable.To improve data availability in hierarchical P2P systems, a novel replication algorithm Dyre based on LDPC coding is proposed. Data are encoded into blocks that contain redundant information with LDPC encoding algorithm. Dynamic replica placement is used to store blocks on different clusters, and the location of each block is determined by the key of that data and the index of that block. Data in invalid peers can be retrieved from predecessor and successor peers. When the number of available blocks is too small, which will induce that data can't be reconstructed, the data is restored. Simulation results show that the algorithm gains higher data availability than traditional replication algorithms, and the time cost of this algorithm is lower than that of replication algorithm in Oceanstore.
Keywords/Search Tags:peer-to-peer computing, overlay network, location-aware, data location, data replication, data availability
PDF Full Text Request
Related items