Font Size: a A A

Optimization Algorithm For Data Reconstruction In Distributed Storage Systems

Posted on:2018-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:B WangFull Text:PDF
GTID:2348330533461353Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Distributed storage system has become an important way for massive data storage because of its low cost and high scalability.In order to maintain data reliability,distributed storage system stores some redundant data so that it can repair the invalid data spontaneously when a node fails.To compensate for potential losses of data,generally,the corresponding amount of data should be regenerated in a new node when a node fails.Compared with replica and erasure codes,regenerating codes not only can provide higher resistance for nodes failure,but also have lower costs in storage and bandwidth.However,the general objective in this field so far focused on methods realized the regenerating codes and parameters in regeneration process,which fails to consider the costs in the actual scenario of regeneration,such as the time spent during regeneration vary with the heterogeneity of nodes and bandwidths.In this thesis,we investigate optimizing solutions to enhance the performance of the regeneration on the premise of satisfying the repair property of regenerating codes.The principal activities of this paper are as follows:(1)After reviewing two kind of traditional fault tolerant schemes for distributed storage system,we make detailed introduction on coding theory and present situation of network codes and regenerating codes.(2)A concept is proposed to separate the regeneration process into four parts.The regeneration process is divided into data distribution,data encoding and decoding,data transmission and data refactoring in chronological order.Then we introduce a data transmission model based on tree topology,which has an effective result for data transmission part.(3)We analyze the limitations of tree topology and propose a new kind of regeneration process based on node ability awareness.Not only can the new process choose high reliability nodes to transmission data for system data recovery,but also it can avoid using the transmission links which have poor bandwidth.By this way,there will be a high transmission rate and data availability for transmitting procedure,and increased data transmission efficiency at the same time.(4)Combining the cooperative regenerating codes and transmission model based on tree topology,we propose a new regeneration process which is called cooperative regeneration process based on tree topology.Two design schemes are proposed to construct tree topology for every new node in regeneration process.Results of experiment validate that the two design proposed in this paper can further reduce the time overhead of regenerating codes and have a better performance on data availability.
Keywords/Search Tags:Distributed storage system, Data recovery, Regenerating codes, Data transmission model, Regeneration time
PDF Full Text Request
Related items