Font Size: a A A

Research On Scaling And Repair Performance For Erasure Coded Cloud Storage Systems

Posted on:2021-07-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y ZhangFull Text:PDF
GTID:1488306107957439Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the increasing scale of cloud storage,the risk of data loss in cloud storage systems is also increasing.Thus,the problem of data reliability of cloud storage has become the hot topic in academia and industry.To resolve this problem,cloud storage systems often adopt erasure coding with low stor-age overhead.Different from general storage systems,cloud storage systems need to meet the complex and changeable storage needs of massive users and provide 7 × 24 and high-availability storage ser-vices.This brings two critical scientific issues to apply erasure coding technology to cloud storage systems:one is the contradiction between the low storage scaling performance of erasure coding and the frequently-changing storage scaling requirements in cloud,and the other is the contradiction be-tween the low data repair performance of erasure coding and high availability of services in cloud.Therefore,this paper focuses on scaling and repair performance in erasure coded cloud storage sys-tems specified by the following four aspects.The erasure coded storage scaling often changes the coding parameters and leads to a large num-ber of parity updates,which inevitably incur substantial network bandwidth,and it affects the ability of providing services in cloud storage system.To address this problem,Reed Solomon(RS)coded(widely used)storage scaling problem is studied.In theory,a theoretical lower bound of the amount of transferred data during the scaling process(i.e.,scaling bandwidth)is obtained by the information flow graph model,and a family of MDS code construction that achieves optimal scaling bandwidth is proposed.A fast network-coding-based storage scaling algorithm is designed to achieve the optimal or near-optimal scaling bandwidth.A distributed storage system prototype NCScale based on the fast network-coding-based storage scaling algorithm is implemented.Experiments on Amazon cloud plat-form EC2 show that the scaling time of NCScale can be reduced by up to 50%over the state-of-the-art Scale-RS.Due to the high availability of storage demand in cloud storage,a new family of erasure codes,regenerating codes,which can greatly reduce the bandwidth consumption caused by repair operations and thus improve data availability,has attracted widespread attention.Current studies on regenerating codes mainly focus on the problem of data repair,but the problem of storage scaling for regenerating codes is still challenging.To address this problem,the storage scaling is studied for two families of regenerating codes,MBR codes and MSR codes.The corresponding scaling schemes are proposed respectively.To reduce the scaling bandwidth,these schemes are based on the locally update and features of the two codes' constructions.These two scaling schemes are implemented atop Hadoop distributed file system(HDFS)and tested on Amazon cloud platform EC2.The results show that the scaling bandwidth can be reduced to 66.5%and 43.5%over the current centralized scaling.Most of existing repair schemes of erasure coding are designed to repair data quickly under the homogeneous and static network,while these repair schemes are difficult to cope with the heteroge-neous and rapidly-changing network in cloud storage.To address this problem,the repair problem of erasure codes in heterogeneous network of cloud storage is studied,and a flexible tree-based pipelined repair scheme,called FTPRepair,is proposed.FTPRepair uses the tree structure to avoid congestion links for quick repair in the heterogeneous network;further,FTPRepair leverages Software-Defined Networking technique to enable a slice-level repair to flexibly adjust to rapidly-changing bandwidth.FTPRepair is prototyped atop Mininet and ECPipe,and tested on Amazon cloud platform EC2.Sim-ulations and experiments show that FTPRepair significantly improves the performance of degraded reads and full-node repair over traditional repair and Repair-Pipelining.Cloud storage systems usually use massive disks to store massive data,which greatly increase the frequency of disk failures,and the distribution of these disk failures is uneven,which affects the high availability of cloud storage.To address this problem,the disk failure prediction methods are integrated with LRC(locally repairable codes)used in Microsoft Azure cloud storage,and a proac-tive LRC(pLRC)method is proposed.pLRC leverages the decision tree based disk failure prediction method to dynamically adjust the size of each group of LRC,such that the data blocks that are more likely to fail can be repaired faster in smaller groups.The data reliability of PLRC is analyzed by the MTTDL model.The results show that the data reliability of pLRC can be improved by 113%over LRC.pLRC is implemented atop Hadoop distributed file system(HDFS),and tested on Amazon cloud platform EC2.The experimental results show that the performance of degraded read and disk repair of pLRC can be improved by 46.8%and 47.5%over LRC.
Keywords/Search Tags:Cloud Storage, Erasure Coding, Erasure Coded Storage Scaling, Erasure Coded Data Repair, Distributed Storage System
PDF Full Text Request
Related items