Font Size: a A A

Research On Routing Algorithm For Distributed Data Deduplication Systems

Posted on:2018-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z M WangFull Text:PDF
GTID:2348330536968737Subject:Engineering
Abstract/Summary:PDF Full Text Request
As the data growing exponentially,distributed deduplication storage system faces significant challenges in order to ensuring the high deduplication rates,high throughput and load balancing.Existing approaches mainly improves those performances through proposing novel data routing algorithms or optimize the efficiency of fingerprints querying process.For example,there are two kinds of proposed data routing algorithms,stateless routing and stateful routing.The former cannot gain high deduplication ratio and sustain load balancing since it doesn't consider the exact data stored in each storage nodes;while the latter cost much time to compare the data in the backup stream and each storage node and thus gain low deduplication throughput.Based on the above observations,we propose a routing algorithm that combining the Fingerprint Sampling and the Fragmentation Reduction(FSFR)to improve both the deduplication throughput and deduplication ratio,and further sustain the load balancing.It works as follows:(1)First,backup client pre-processes the data by merging the consecutive data chunks into one super-chunk and then extracts sampled fingerprints according to systematic sampling;(2)Second,the sampled fingerprints are queried in each storage node with a Bloom filter to find the redundant fingerprints,and then calculate the ratio of deduplication ratio and storage utilization;(3)Third,backup client selects top K storage nodes according to the ratio calculated in step(2),and then sends all the fingerprints of every super-chunk to the selected K storage nodes to find the data fragments;(4)Finally,backup client selects the storage node who have the minimum fragments as the data routing node and then sends all the data to this node to do data deduplication.We have done extensive experiments with real workloads to access the benefits of FSFR routing algorithm.The experimental results showed that FSFR outperforms both the deduplication throughput and deduplication ratio over the existing routing algorithms.Its deduplication throughput is up to 50% higher than EMC stateful algorithm and Boafft algorithm.
Keywords/Search Tags:Data deduplication, Distributed Deduplication Storage System, Routing Algorithm, Fingerprint sampling, Fragmentation reduction
PDF Full Text Request
Related items