Font Size: a A A

Research On Deterministic Construction Of Replica Placement Scheme

Posted on:2021-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z T GaoFull Text:PDF
GTID:2428330602494307Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
With the economic and social development,the scale of data continues expanding,Massive data is stored in storage systems.However,in large storage systems,node failure is quite normal rather than abnormal.In order to deal with the problems of data loss caused by node failures,storage systems generally adopt the data redundancy mechanism to obtain better system reliability and data availability at the expense of storage utilization.Data redundancy mechanism includes erasure code mechanism and backup mechanism.The backup mechanism replicates multiple copies of each data block and stores these copies in different nodes.An important research aspect of the backup mechanism is the replication placement scheme,that is,how to specify a storage node for each data block..Common replication placement schemes include Random Replication,Copyset Replication and Tiered Replication.Among them,the Copyset Replication is a general-purpose replication placement scheme.Compared with the previous schemes,Copyset Replication provides a nearly optimal compromise between the number of nodes where data is scattered and the probability of data loss.However,it builds the copysets(a group of storage nodes containing all replicas of a data block)by using trial-and-error algorithms,making it difficult to predict the required time and end conditions,which greatly affects the performance of the algorithm and can even render the algorithm unavailable.In this paper,we proposed two replication placement schemes.(1)The first one is the Deterministic Schemes of Copyset Replication,whichdirectly gives the method to build the copyset in linear time so that the problemsof long build time and no clear end condition caused by the trial-and-erroralgorithm in the original Copyset Replication are avoided,and maintaining thesame data loss probability as the original scheme.(2)The second replication placement scheme,which called G-Scheme,is also adeterministic scheme that can also build copysets in linear time.When theparameters satisfy(N,R,S)=(l(l-1)/2,l-1,2(l-2)),we proved that theG-Scheme can build the minimum number of copysets,achieving the theoretically lowest probability of data loss.In particular,the G-Scheme can generate optimal results that cannot be generated by the Copyset Replication.Finally,we compared the HDFS Random Replication,the Copyset Replication,the Tiered Replication,the Deterministic Schemes of Copyset Replication and the G-Scheme,analyzing their advantages and disadvantages respectively.
Keywords/Search Tags:Backup Mechanism, Replication Placement Scheme, Deterministic Algorithm, Data Loss Probability
PDF Full Text Request
Related items