Font Size: a A A

Research On Data Consistency Maintenance Of Massive Data In Peer-to-Peer Distributed Storage Systems

Posted on:2008-03-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:J ZhouFull Text:PDF
GTID:1118360242999358Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, peer-to-peer (P2P) computing has become a popular network computing paradigm. Researches on and applications of P2P computing have spread into many fields. P2P distributed storage systems, constructed by P2P computing technique, can offer data sharing and storage services for massive users and massive data. Data replication, one of the crucial technologies for managing massive data, can improve data availability and data access performance, but bring about problems of maintaining data consistency. Compared with other distributed systems, peer-to-peer systems exhibit some special characteristics, such as large-scale, dynamic, and heterogeneity, which has brought many challenges for data consistency maintenance technique in P2P distributed storage systems. Based on characteristics of massive data and P2P systems, an intensive study is conducted in this dissertation of data consistency maintenance technique for P2P distributed storage systems. The results are listed as follows:Numbers of replicas in P2P distributed storage systems deteriorate inconsistency and load imbalance among them. In this dissertation, a multi-replica clustering management method based on limited-coding, RCLC, is proposed to solve the resource management problems brought by a large number of replicas. In the method, according to the process of creating new replicas from existent single replica, replicas are partitioned into different hierarchies and clusters; then replicas are coded and managed based on the user-defined limited-coding rule LCR consisting of replica hierarchy and replica sequence, which can also dispose the alteration of clusters caused by dynamic adjustments on replicas effectively. After that, a management model of centralization in local and peer to peer in wide area is adopted to organize replicas, and the cost of reconciling consistency can be greatly decreased, combining with defined minimal-time of update propagation, in which all the updates in local cluster can be united. The results of the performance evaluation show that RCLC is an effective way to manage a large number of replicas, achieving good scalability.It is of great importance to improve update propagation and minimize space overhead effectively because the data number and each data object of massive data may be great. To solve these problems, an optimistic data consistency maintenance method, PLCP, is proposed to improve update propagation and reduce the space overhead of write-log. In the process, home replica is used to resolve updates conflict. A new anti-entropy partner selection method DAPS, based on the distribution of updates and the information of local write-log is presented. The method analyzes the condition of removing updates from write-log, and uses write-log truncation appropriately during updates propagation to remove out-of-date updates in time. The simulation results show that DAPS can achieve good adaptability, and PLCP can lower time overhead and space overhead of update propagation.Data dependence in data consistency embodies false-conflict updates and update dependency. An optimistic data consistency maintenance method, DACP, is proposed to solve the problems of data dependence in P2P distributed storage systems. In the method, data object is partitioned into data blocks by fixed size as the basic unit of data management. Updates are compressed by Bloom filter technique and propagated in double-path. Negotiation algorithms detect and reconcile update conflicts, and dynamic data management algorithms accommodate dynamic data processing. Data partition can eliminate false-conflict updates and negotiation algorithms can resolve update dependency. The results of the performance evaluation show that DACP is an efficient method to achieve consistency, good dynamic property, and strong robustness when choosing the size of data block appropriately. At the same time, a feasible way is put forward on how to choose appropriate data block size.Updates in P2P distributed storage systems may be delayed for P2P systems are generally large-scale and strong distributed, and then depress resource location performance in Internet. An optimistic data consistency maintenance method, KACP, is proposed on the basis of the characteristics of updates about key-attributes in P2P systems: simple description, little update item and weak dependence. In the method, the update about key-attributes is separated from user update request. Key-updates are propagated by latency-overlay update propagation model. Based on classifying key-update conflicts, a double-level reconciling mechanism including buffer preprocessing and update-log processing is applied to detect and reconcile conflicts. Delaying key-updates cannot occur by the optimistic disposal method, and then it cannot depress efficiency of resource location based on key-attributes, which adapts well to P2P systems for Internet. The simulation resul(?)s display KACP is an effective optimistic data consistency maintenance method, achieving good resource location and resource access overhead.
Keywords/Search Tags:peer-to-peer computing, massive data, data replication, data consistency, update conflict
PDF Full Text Request
Related items