Font Size: a A A

Research On Data Redundancy And Maintenance Technology In Distributed Storage System

Posted on:2012-09-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:1228330371952528Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Distributed storage system serves as one of the effective means for solving the problem of mass data storage. It uses redundant data maintain technology, by the collaboration of a large number of nodes distributed on the network, to achieve long-term reliable data storage services. Existing large-scale data centers, P2P network storage and wireless networking technology etc., all of which belong to the category of distributed storage system. However, some nodes in the system may be temporarily or permanently disabled. In order to ensure reliability and availability of systems, redundant data is generally adopted by the storage system. Therefore, the redundancy and maintenance technology has become an important research issue in distributed storage system.At present, for data redundancy and maintenance technology of distributed storage system, the major problems currently facing are: 1) When taking the different data redundancy strategy, we must learn more about the data reliability for the strategies, and then predict the probability of the system failure, the size of the required data redundancy, system life cycles and so on. 2) For different data redundancy strategies, we need to study more effective storage encoding. 3) Redundancy using erasure codes for distributed storage systems, its data restoration would consume a lot of network bandwidth, which may be unable to be tolerated for some low-speed storage network. We must improve data recovery method for the erasure codes redundancy. 4) The new application of distributed storage have changed from the traditional static file sharing to dynamic file interaction, the replicated files update frequently. So the problem of maintaining data replicas’consistency must be taken into account. In a word, it has an important theoretical and practical significance on studying the the data redundancy and maintenance in distributed storage system.In the dissertation, for these problems, an in-depth research was conducted from the four aspects, that is, the reliability of different redundant strategies, the realization of the minimum storage and minimum bandwidth redundancy coding, the use of interference alignment techniques to repair redundant data, redundant data consistency maintenance. We have made some innovative achievements for data redundancy and maintenance in distributed storage system.The main research work and innovative achievements are reflected in the following several aspects:1. Mathematical model (DRSRM, Data Redundancy System Reliability Model) is proposed which can help to predict the reliability of redundant distributed storage system. In my dissertation, Data availability of redundant maintenance of the replication and erasure codes are analyzed. The mathematical model of storage node failure and repairing is proposed and the reliability of the storage node is analyzed. On the basis of which reliability forcast model for the replicated data redundancy storage system is also put forward, which is used to simulate the maintenance process of data redundant, and then calculate the system failure rate, the time period, the life cycle of the system etc..2. Minimum storage redundancy regenerating code (MSRRC) and the minimum bandwidth redundancy regenerating code (MBRRC) are presented. According to some literature, the data regeneration has the two extreme points: the minimum bandwidth of regeneration points (MBR) and minimum storage regeneration points (MSR), therefore, MBRRC and MSRRC are presented. We analyze the principle of two types code’s reconstruction and regeneration, prove the reliability of realization principles, and describe in detail the implementation process by the examples. Finally, the experimental results prove it effectiveness.3. Interference alignment technology (RDMIA, Redundancy Data Maintenance based on Interference Alignment) is proposed to reduce network overhead when redundancy data need to be repaired. Its prominent advantage is that: 1) Loss of block can be directly repaired from the subset of other encoded block, no need to reconstruct the original data. 2) Invalid block can be repaired from a fixed number of survival encoded block. The number only depends on the number of missing coded patch, without knowing which patch is lost. Applying this technology can greatly reduce the network costs when redundancy data is maintained in distributed storage system. 4.The replica broadcast tree (RBT, Replica Broadcast Tree) is proposed to maintain the redundancy data consistency. By the construction of RBT for every replica’s key, system can trace the replica location and spread replica’s updated information. The strategy can effectively avoid the problem of hot spots and node failures. At the same time, system stores replicas by avoiding display ID and IP address of recording node, therefore, it can effectively protect the privacy of the nodes.
Keywords/Search Tags:distributed storage system, reliability, redundancy data, coding, regenerating code, data consistency, maintenance, copy, erasure code, replica, information broadcast tree
PDF Full Text Request
Related items