Font Size: a A A

Research On Repair Mechanism Of Failure Nodes In Distributed Storage Systems

Posted on:2020-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:W SunFull Text:PDF
GTID:2428330602951835Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Nowadays,the amount of data generated in the network environment continues to grow rapidly,and distributed storage has become more and more widely applied as an emerging online storage technology.Due to equipment failures of hardware or software,human error and other factors,the problem of node failure has become a common situation.In order to maintain the high reliability of the systems,it is necessary to repair the failed nodes in time.Therefore,it is a key issue to design a good repair mechanism for failure node in distributed storage systems.Distributed storage system usually guarantees the reliability of data by redundancy strategy.There are two main types of traditional data redundancy strategies: replication strategies and erasure code strategies.The replication strategy is easy to implement and deploy,but the storage overhead of nodes is very large.Compared with the replication strategy,the erasure code strategy can reduce the storage overhead of nodes and improve the storage efficiency effectively while ensuring the system reliability.However,the repair bandwidth overhead of erasure code strategy is too large,because the amount of data transmitted when repairing the failed node is as much as the entire original file.In order to solve the shortcomings of these two redundancy strategies,the idea of network coding is introduced into regenerating code strategy.While ensuring that the system has low storage overhead,the regenerating code strategy can be effective to reduce the repair bandwidth overhead.Therefore,the regenerating code strategy has broad application prospects.This paper focuses on the node repair mechanism based on regenerating codes.The main work is as follows:(1)Because that the problem of single-node failures is the most commen situation,it is very important for distributed storage systems to design a good mechanism for single-node repair.To solve the problems of traditional complete graph MBR codes that the disk reading overhead when repairing the failed node is large and the computational complexity is high due to MDS codec on the finite field,a coding scheme with good repair locality based on the complete graph MBR codes is proposed in this paper.In the proposed scheme,the nodes in the distributed storage system are divided into multiple repair groups,and the original file is stored in these repair groups.The proposed scheme can repair single nodes exactly.Compared with the traditional complete graph MBR codes,the proposed scheme reduces the disk reading overhead greatly during node repair process,and further reduces storage overhead and bandwidth overhead of nodes when given the same values of n and k.In addition,only simple XOR operations are needed in codec in the proposed scheme,which can reduce the computational complexity of codec.(2)Multiple nodes fail simultaneously sometimes in actual distributed storage systems especially in the case of poor device stability and complex environment.Besides,a "delay repair" strategy is adopted in many distributed storage systems.This paper further studies the repair mechanism for multi-node failure..To solve the problem of traditional MSCR coding scheme that a large number of transmission channels is required during node repair process,which results in complex repair process and poor repair stability,a coding scheme with survival node cooperation based on MSCR coding scheme is proposed in this paper.Theoretical analysis shows that the proposed scheme solves the problem of high number of transmission channels required in the repair process of the MSCR coding scheme.While guaranting the same low storage overhead and repair bandwidth overhead as the MSCR coding scheme,the proposed scheme in this paper makes the repair process easier,.reduces the transmission channel overhead of node repair,reduces the probability of repair failure,and improves the reliability of repair.
Keywords/Search Tags:distributed storage system, regenerating codes, node repair, complete graph MBR codes, MSCR codes
PDF Full Text Request
Related items