Font Size: a A A

Study On Fault-tolerant Mechanisms Of Distributed Storage Systems Based On Network Coding

Posted on:2011-08-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y C HuFull Text:PDF
GTID:1118360305966768Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Currently, the information technology industry has stepped into the storage era from the computing era, which makes mass storage a trend.Distributed storage systems are based on network technology, and mainly use small servers or PCs to build storage pools, so the characteristics of low cost and high scalability are suitable for mass storage.However, due to the low availability of distributed storage nodes, ensuring high data reliability has become a crucial problem.To ensure data reliability depends on data fault-tolerant technologies, and the key issue of fault tolerance is how to repair effectively to minimize the system resources consumed by recovery. This dissertation studies fault-tolerant mechanisms of distributed storage systems based on network coding.Its main works and contributions are as follows:(1) Modeling of fault recovery in distributed storage systems:In this dissertation, the fault recovery problem in distributed storage systems is formulated as a transmission model based on information flow graph, so as to analyze the lower bound of the maintenance bandwidth with information flow theory. A virtual source node is introduced in the model, which makes that a sequential repair for multiple node failures can be converted into multiple independent single repairs, and significantlyy simplify the analysis of the recovery complexity. Using the model, it is proved that there is no need to transfer any data among survival nodes during the repair process, which provides certain theoretical basis for data transmission mechanism.(2) A flexible recovery mechanism for node failures:Existing repair mechanisms usually require that all nodes to be repaired must be connected to the same number of survival nodes to complete the repair. This dissertation presents a flexible recovery mechanism for node failures (Multi-loss Flexible Recovery, MFR), which allows different nodes to be repaired can be connected to different numbers of survival nodes to complete the repair. The lower bound of maintenance bandwidth based on MFR is analyzed, and a repair algorithm based on random linear network coding is designed.The algorithm reaches the lower bound of maintenance bandwidth.Therefore, the algorithm is the bandwidth-optimal repair algorithm based on MFR. By numerical comparisions of MFR with the existing best repair algorithm, the maintenance bandwidth is reduced by over 20% when the redundancy degree of data is low.(3) A mutually cooperative recovery mechanism for multi-node failures:This dissertation studies the problem of multi-loss recovery, and presents a mutually cooperative recovery mechanism (Mutually Cooperative Recovery, MCR). The lower bound of maintenance bandwidth based on MCR is analyzed, and a repair algorithm is proposed based on random linear network coding.This algorithm is proved to be valid based on the introducing of a strong-(n, k) property. It reaches the lower bound of maintenance bandwidth. Therefore, the algorithm is the bandwidth-optimal repair algorithm based on MCR. By numerical comparisions of MCR with the existing best repair algorithm, the maintenance bandwidth is reduced by 10% while the storage capacity is decreased by 20%.(4) Asymmetric fault recovery problem:The existing models are based on the symmetric repair mechanisms, namely each repair link consumes the same bandwidth.This dissertation analyzes the lower bound of maintenance bandwidth based on the asymmetric repair. It is found that the lower bound is greater than or equal to maintenance bandwidth comsumption of the proposed bandwidth-optimal repair algorithm based on MCR. So the algorithm is still a bandwidth-optimal algorithm in the asymmetric fault recovery problem.
Keywords/Search Tags:Distributed storage, Network coding, Information flow graph, Fault tolerance, Data recovery
PDF Full Text Request
Related items