Font Size: a A A

Research On Coding-based Fault-tolerance Repairing Optimizing Techniques For Distributed Storage

Posted on:2018-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:S DingFull Text:PDF
GTID:2428330512998258Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,the explosive growth of information scale poses great challenges for data storage.The distributed storage system based on node cluster via networks has more advantages than the traditional storage array in cost performance and scalability.To ensure reliability,distributed storage will adopt a fault-tolerant mechanism to cope with node failures in the cluster,such as multi copy mechanism.Multi copy mechanism is simple and easy to use,but it has the disadvantages of too large storage overhead and poor ability of fault tolerance.For this reason,erasure code method which has very low storage overhead is proposed,but the network overhead is too high when the failure node is repaired.In recent years,the scheme of regenerating code has been put forward,and the use of network coding has fundamentally reduced the network traffic overhead.At the same time,more and more researches show that the repair time delay can be effectively reduced by using the network link bandwidth when repairing the failure data.However,because the encoding and decoding mechanism of the regenerating code mechanism is different from the erasure code,and the conputational overhead is great,it is difficult to directly apply the regenerating code technology into the erasure code system to reduce the repair network overheadAt the same time,the existing repair techniques based on network topology are mostly aimed at erasure codes,which are not suitable for the Local Reconstruction Codes(LRC)as a representative of locally repairable codes,which are simple and effective coding methods.Aijing at the above problems,we apply the advantage of tow network bandwidth cost when repairing failed node by regenerating code,and we design and inplement the storage system based on the regenerating code.In view of the actual repair characteristics of LRC,we have carried out detailed research on how to eomb:ine the network topology and link bandwidth information to eomplete the single node and multi node faulure repair mechanism in LRCThe main work of this thesis incb,des the following aspects:1)Aiming at the problem that the erasure coding scheme costs too much network overhead to repair the failed node in Cumulus,we design and implement a distributed storage mechanism based on regenerating code.As a result,the system effectively reduces the repair network overhead.By the optimization of reading,writing and repairing ffles,the system also achieve good read and write performance2)Because algorithm of repairing foiled nodes for LRC is different fix>m ordinary erasure code,we consider how to effectively apply network topobgy into optimize network overhead for recovery of foiled node.We study the repair process and characteristics of LRC nodes in detail and propose a tree restoration algorithm based on divide and conquer and greedy idea using network topology.We designed the verification experiment,and the results show that compared with the star repair process with direct link between nodes,our algorithm greatly reduces the delay of node repair.
Keywords/Search Tags:Distributed Storage, Node Repairing, Fault-tolerant, Network Topology
PDF Full Text Request
Related items