Font Size: a A A

Erasure Codes For Fast Data Repairing In Distributed Storage Systems

Posted on:2018-09-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:1318330515983381Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
To avoid data unavailability caused by the device failure and network breakdown,era-sure codes are widely used in distributed storage systems for data reliability.Traditional era-sure codes encounter a big repair-cost problem:the amount of data required during a repair is much larger than the size of the lost data.Huge repair data would consume valuable disk I/O and network bandwidth,and indirectly weakens the reliabilty of the system by expos-ing the system to a long window of vulnerability in which additional node failures can lead to unrecoverable data loss.Recently,researchers have proposed some reapir-cost-efficient erasure codes,but they either sacrifice some beneficial properties,such as the minimum storage,or only appliable for some specific coding parameters.In addition,data encoding and data repairing of erasure codes need lots of computation resources,and how to reducing such computation cost is one of key researches in erasure codes' field.This paper makes a research of erasure codes for data repairing in erasure-coded distributed storage systems from two aspects:constructing erasure codes with feasible parameters and low repair cost,as well as reducing computation cost during the data encoding and data repairing.The main innovative contributions of the paper include three key points:Proposing a novel class of erasure codes—General Functional Regenerating(GFR)codes.With the help of a balancing parameter,GFR codes can achieve not only the minimum storage cost or the minimum repair cost in a distributed storage system,but also the trade-off between them.They also adopt a heurist method to try their best to find the least repair cost for a single node failure.Experiments show that the distributed storage system adopted GFR codes have higher reliability than that adopted RAID codes.Experiments also show GFR codes can reach the theoretially optimal/near-optimal repair cost,and their encoding performance and repairing performance are close to those of FMSR codes.Proposing a fast multiplication method of a matrix and data blocks over the Galois field—Scheduled Shift Multiplication(SSM).The SSM carefully schedules the arithmetic operations and reduce the computation cost of data encoding and data repairing for erasure codes over the Galois field.Experiments show that the SSM has less mispredictions than traditional methods,and it can accelerate not only speeds of data encoding for both RS codes and GFR codes but also data repairing for GFR codes.Experimental analysis for GFR codes'repairing shows that the SSM has litte effect to their repair cost.Proposing a novel class of erasure codes with optimal repair cost under the minimum storage—Z codes.By combining permutation matrices,Z codes can construct the generator matrices with the optimal repair cost.Z codes utilize tensor product to iteratively construct generator matrices of any parameter set,with the optimal repair cost kept.Moreover,Z codes are systematic codes,so they retain the native data after encoding.They also have other favorable properties,such as low update cost and low computation cost.Z codes have feasible parameters,and they can theoretically achieve arbitrarily high storage efficiency and failure tolerance.GZ codes are Galois-field extensions of Z codes.They possess Maximum Distance Separable(MDS)property,and maintain the optimal repair cost and systematicness of Z codes.Through experimental analyses,Z/GZ show comparable encoding speeds and repairing speeds with RS codes,but Z/GZ codes reduce over 37.5%response time during the data repairing.Both GFR code and Z code can reduce the repair cost in distributed storage systems.The former is nonsystematic,so only parity data are kept after encoding,and they can reduce the repair cost of each node in the system.While the latter is systematic,so native data are kept after encoding,and they can only reduce the repair cost of the native part.GFR codes can further reduce the repair cost by increasing the storage cost,so they are more suitable for distributed storage systems that require frequently data repairing or have big burden on data repairing.While systematicness,high coding performance and low update cost of Z codes makes them be capable to use in high performance storage systems.The SSM provides a new choice of fast computation method for data encoding and data repairing over the Ga-lois field.All three innovations can improve the repairing performance for erasure-coded distributed storage systems,and reduce the response time of data repairing and enhance the reliability.
Keywords/Search Tags:Erasure Codes, Distributed Storage System, Data Repairing
PDF Full Text Request
Related items