A Study Of Data Regeneration In Distributed Storage Systems

Posted on:2013-09-05

Degree:Master

Type:Thesis

Country:China

Candidate:J Li

Full Text:PDF

GTID:2248330395450883

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Distributed storage systems store a substantial amount of data in a large number of storage nodes, maintaining the integrity of data by storing redundancy. To compensate for potential losses of data, the amount of redundancy should be maintained such that when a node fails, the corresponding amount of redundancy should be regenerated. MDS codes can provide better tolerance against node fail-ures than replications, yet with a significantly higher transmission cost during regeneration. A class of codes among MDS codes, called regenerating codes, has been proposed to achieve an optimal trade-off curve between the amount of storage space required for storing redundancy and the network traffic during regeneration. However, the general objective so far focused on minimizing the actual network traffic caused by regeneration, which fails to consider the costs in the actual sce-nario of regeneration, such as the time spent during regeneration and the number of participating nodes.In this thesis, we investigate optimizing solutions to enhance the performance of the regeneration without sacrificing the data integrity, utilizing both theoretical analysis and extensive simulation with real-world data. After presenting the cur-rent state-of-the-art schemes of the maintenance of redundancy, we first propose a tree-structured regeneration process that utilizes the bandwidth heterogeneity in the network and thus saves the time spent during regeneration significantly. We then model the network with asymmetric links and design the construction of regeneration process with multiple parallel trees. On the other hand, based on the observation that the number of participating nodes affects the efficiency of regeneration, we pipeline the regeneration processes of multiple nodes to im-prove the efficiency of regeneration. Based on our analysis, we demonstrate that the pipelined regeneration process can save the number of participating nodes sig-nificantly, and thus reduce the regeneration time and the network traffic, while introducing marginally additional storage overhead without sacrificing the data integrity. We show that our design can work for both random linear codes and regenerating codes, supporting to regenerate either one failure or multiple failures in batches.

Keywords/Search Tags:

distributed storage system, data regeneration, bandwidthheterogeneity, pipeline, random linear codes, regenerating codes

PDF Full Text Request

Related items

1	Research Of Regeneration Codes In Distributed Storage Systems
2	Research On Data Security And Coding Techniques For Distributed Storage Systems
3	Optimization Algorithm For Data Reconstruction In Distributed Storage Systems
4	Research On Fault-tolerant Technology Of Cloud Storage System In Big Data Environment
5	A New Class Of Regenerating Codes In Distributed Stotage Systems And Research On Its Decoding Algorithm
6	Research On Exact Repair Of Regenerating Code Based On Distributed Storage System
7	Research On Repair Mechanism Of Failure Nodes In Distributed Storage Systems
8	Study On Fast Repair Of Failed Nodes In Distributed Storage Systems
9	Research On Secure Distributed Storage Architechture And Fault-tolerant Techneque In Cloud Computing
10	Research On Regenerating Codes With Low Complexy For Distributed Storage Systems