Font Size: a A A

A Study Of Data Regeneration In Distributed Storage Systems

Posted on:2013-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2248330395450883Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Distributed storage systems store a substantial amount of data in a large number of storage nodes, maintaining the integrity of data by storing redundancy. To compensate for potential losses of data, the amount of redundancy should be maintained such that when a node fails, the corresponding amount of redundancy should be regenerated. MDS codes can provide better tolerance against node fail-ures than replications, yet with a significantly higher transmission cost during regeneration. A class of codes among MDS codes, called regenerating codes, has been proposed to achieve an optimal trade-off curve between the amount of storage space required for storing redundancy and the network traffic during regeneration. However, the general objective so far focused on minimizing the actual network traffic caused by regeneration, which fails to consider the costs in the actual sce-nario of regeneration, such as the time spent during regeneration and the number of participating nodes.In this thesis, we investigate optimizing solutions to enhance the performance of the regeneration without sacrificing the data integrity, utilizing both theoretical analysis and extensive simulation with real-world data. After presenting the cur-rent state-of-the-art schemes of the maintenance of redundancy, we first propose a tree-structured regeneration process that utilizes the bandwidth heterogeneity in the network and thus saves the time spent during regeneration significantly. We then model the network with asymmetric links and design the construction of regeneration process with multiple parallel trees. On the other hand, based on the observation that the number of participating nodes affects the efficiency of regeneration, we pipeline the regeneration processes of multiple nodes to im-prove the efficiency of regeneration. Based on our analysis, we demonstrate that the pipelined regeneration process can save the number of participating nodes sig-nificantly, and thus reduce the regeneration time and the network traffic, while introducing marginally additional storage overhead without sacrificing the data integrity. We show that our design can work for both random linear codes and regenerating codes, supporting to regenerate either one failure or multiple failures in batches.
Keywords/Search Tags:distributed storage system, data regeneration, bandwidthheterogeneity, pipeline, random linear codes, regenerating codes
PDF Full Text Request
Related items