Font Size: a A A

Offline Archiving Optimization For Erasure-Coded Cluster

Posted on:2017-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q WangFull Text:PDF
GTID:2348330503489795Subject:Computer Science and Technology
Abstract/Summary:
Erasure codes with the feature of low storage overhead but high fault-tolerant, can be used to archive low-frequency access data copies, which not only ensure data availability, but also improve storage utilization. According to the rules of storing data blocks, archive can be divided into fixed-layout data archiving and random-layout data archiving.In the fixed-layout data archiving, conventional synchronous coding schemes have the problems of the disk competition between the process of reading and writing, and the archiving performance bottleneck of network bandwidth, because of centralized coding. To overcome these disadvantages, we use the pipeline coding approach to accelerate the performance of data archival in storage cluster. First, a chained-declustering mechanism is applied to mirrored RAID-5 and triplication redundancy groups to propose two new data layout, respectively [D+P]cd and [3X]cd. Secondly, based on [D+P]cd and [3X]cd data layout, we designed two archiving solutions DP and 3X, which exhibits the following three salient features:(i) Using data locality, the relevant nodes in the encoding process read two or three local data blocks;(ii) dispersing the computational load, the encoding operation is dispersed into the k data nodes; and(iii) parallel archiving, two or three encoding lines are deployed, generating parity blocks. DP, 3X and three existing solutions(ie Syn E, DE and Rapid RAID) are archieved in a real storage cluster. Experimental results show that, in the 9-node storage cluster, the archival time of our solutons are better than the three other archiving solutions at least 3.41 times.In the random-layout data archiving, traditional archiving solutions distribute large block into smaller data blocks, and follow the data locality in distributed file system, which have defects of random read and inequal tasks assignment. In this paper, prefetching is used to the original random layout archiving solutions(CArch, DArch and BArch) and three improved prefetching archiving solutions(PCArch, PDArch and PBArch) are proposed. With achieving the above six random layout archiving solutions in a real storage cluster, experimental results show that the prefetching technology can effectively improve the performance of random layout archiving, and the performance of prefetching archiving programs is at least 1.62 times than non-prefetch prefetch archiving program.
Keywords/Search Tags:Erasure code, Archiving, Fixed-layout, Random-layout, Pipeline coding
Related items