Font Size: a A A

Study On Fault-tolerant And Scaling Problems Of Distributed Storage Systems Based On Netork Coding

Posted on:2014-02-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:H T ZhaoFull Text:PDF
GTID:1228330395458599Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays, the information technology industry is growing very fast. It has stepped into the storage era from the computing era, which makes massive storage a trend. Distributed storage systems are suitable for massive storage and becoming more and more widely used because of its low cost and high scalability. Distributed storage systems ensure high data reliability through redundant storage. And a good failure recovery mechanism that can repair failed nodes efficiently is needed to maintain the redundancy of a system when failures occur. Compared to traditional redundancy strategies (replication, erasure codes), network coding based distributed storage (regenerating codes) reduces the storage cost of ensuring system reliability and the bandwidth cost of repairing failed nodes. So it has good application prospects. We want to study the coding methods and failure recovery mechanisms of distributed storage systems based on regenerating codes. Another key problem of distributed storage is system scaling. For a regenerating codes based distributed storage system, how to change its scale efficiently is also an important problem that needs to be solved.This dissertation studies the failed nodes recovery problems and system scaling problems of network coding based distributed storage systems. Its main works and contributions are as follows:(1) A deterministic algorithm of single failed node recovery in MSR (Minimum Storage Regenerating codes) based distributed storage systemsA good recovery mechanism for failed nodes is very important to a distributed storage system. This dissertation studies the single failed node recovery problem of MSR based distributed storage systems. Through analysis of the two encoding steps in the node recovery process, we give the conditions of each encoding step. Then we give a practical deterministic algorithm that meets these conditions, with reference to the deterministic algorithms of intermediate nodes in multicast problems. Compared to traditional random algorithms, our deterministic algorithm can100%ensure reconstruction property, and that brings the system a higher reliability. Meanwhile our algorithm only require a much smaller finite field, which greatly reduce the computational cost and memory cost in the encoding/decoding process.(2) An explicit construction of exact minimum bandwidth cooperative regenerating codesThe situation of recovery from multiple losses often occurs in distributed storage systems. For multiple losses recovery of E-MBCR (Exact Minimum Bandwidth Cooperative Regenerating codes) code, currently there only exist a construction for the case of r=n-k. Although it can repair r’<n-k nodes failure but the bandwidth cost and storage cost are high. In this dissertation, we show the existence of any (n, k) E-MBCR code that supporting any r(2≤r≤n-k) failed nodes recovery which meets the cut-set lower bound of repair bandwidth. We also give an explicit construction of such codes with corresponding failed nodes recovery and original file construction methods. Our method reduces the repair bandwidth through multiple nodes cooperative recovery, and maintains the existence of original file blocks through exact repair to improve the file access performance. Compared to the E-MBCR code for the case of r=n-k, our method reduces the repair bandwidth cost and system storage cost.(3) Scaling up of E-MSR based distributed storage systemsDistributed storage systems are often faced with not sufficient storage capacity and access performance, which brings the requirement of system scaling. This dissertation firstly raises the scaling problem of regenerating codes based distributed storage systems, and studies the scaling problem of storage capacity of E-MSR (Exact Minimum Storage Regenerating codes) based systems. We formulate the problem and provide its limiting conditions and optimization objectives. We mainly focus on the optimization of bandwidth cost in the scaling process, and indicate that the key point is to make the encoding matrices of the systems before and after scaling having more correlation. Through constructing encoding matrices of smaller system using the encoding matrices of larger system, we make the encoding matrices having large parts keeping unchanged in the scaling process. So the redundancy nodes only need to download a small part of data to update their encoded blocks, which reduces the bandwidth cost of the whole scaling process.
Keywords/Search Tags:Distributed storage, Network coding, Regenerating codes, Failed nodesrecovery, System scaling
PDF Full Text Request
Related items