Font Size: a A A

Study On Multi-node Fault Recovery In Distributed Storage Systems

Posted on:2021-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2518306470481274Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet technology,massive data has become a trend.The distributed storage system has the advantages of sharing storage load,low cost,and high scalability,and is very suitable for the current mass data storage.In order to ensure the reliability when the storage node fails,various coding technologies are used in the system.The simplest and most commonly used encoding strategy is replication,but the need to store multiple copies makes the storage overhead prohibitive.Another encoding strategy is erasure coding.The erasure coding strategy has the smallest storage overhead for a given reliability requirement,resulting in higher repair bandwidth overhead.Based on the network coding method,Dimakis et al.proposed the concept of regenerative codes to achieve the best compromise between storage overhead and repair bandwidth overhead.The regenerative code can greatly reduce the amount of data transmitted during the repair of the faulty node,but it needs to connect more surviving nodes,and the repair is more local.Locally repairable codes(LRC)reduce the number of nodes that need to be connected when a faulty node is repaired by grouping storage nodes and generating group coding blocks.However,at present,the research of node fault repair focuses on single node,and the performance of multi node fault repair still needs to be improved.Therefore,this paper studies the fast repair of multi node fault as follows:(1)A new coding scheme based on LRC for non-uniform fault protection(GRC-NFP,group repairable codes based on non-uniform fault protection)is proposed for storage systems with hot and cold file.The disk read overhead repaired by small multi-fault nodes provides higher levels of protection for hot files and nodes with high probability of failure.First,the files are grouped hot and cold,and then the data blocks are sorted according to the failure probability of the stored target node.The data blocks are stored in high or low failure data groups and different numbers of group coding blocks are generated according to the failure probability of the data packets.Theoretical simulation show that hot files can be effectively protected and have locality of node failure repair that is lower than the group repair code(GRC).And for storage efficiency,although it is slightly higher than the Reed-Solomon code(RS codes),it is lower than GRC,so the overall performance of GRC-NFP is better.(2)Two construction algorithms for optimal binary local repair codes based on hypergraph factorization are proposed.The codes constructed by the two algorithms can be repaired in parallel using multiple repair sets when multi-node failures,that is,the number of connected nodes is small.Specifically,using the idea of factorization of a hypergraph,a linearly consistent regular hypergraph is transformed into linear encoding and then factorized.Whether the regularity of the subgraph obtained by the decomposition is the same or not determines the difference between the two construction algorithms.In the end,a variety of locality repair codes with the same locality and different availability as the original linear code were obtained,and the cold and hot properties of the data block were considered in the second structure,so that the hot data block has higher availability.Performance analysis shows that both construction algorithms can have the optimal code rate under the appropriate parameter value conditions,and can always reach the optimal limit without being limited by parameters.
Keywords/Search Tags:Distributed storage, Locally Repairable Codes, Group Repairable Codes, Group Repairable Codes based on Non-uniform Fault Protection, Factorization
PDF Full Text Request
Related items