Font Size: a A A

Research On File Similarity-Based Deduplication In Network Backup

Posted on:2013-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhengFull Text:PDF
GTID:2248330392457818Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
There is a major challenge in De-duplication systems: replicas looking-up diskbottleneck. Traditional schemes maintain an in-ram global index to identify if the incomingdata objects are replicas. As the data amount rises, the global index will expand explosivelyand exceed the capability of Ram, which results in its on-disk residence. While disk accesssignificantly lags ram access, on disk index looking-up will severely compromise thethroughput of de-duplication system. The similarity-based de-duplication technique takesadvantage of files’ similarity to merge files and reduce data chunks’ looking up to one diskaccess per file. Although similarity-based de-duplication technique has the advantages oflow Ram usage and high throughput, can greatly alleviate the disk bottleneck, there are stillsome shortcomings which circumscribe its application and performance in large scaleextensible network backup system.In order to further relieve the disk looking-up bottleneck and improve de-duplicationthroughput, this article analyzed similarity-based de-duplication system’s defects in detail,combined some advantages from locality-based systems and put forward a “CongregatedBinning” de-duplication technique. The Congregated Binning made the followingimprovement over the Extreme Binning: divides or assembles files into segments; usescustomizable strategy to congregate bins; caches recently accessed bins and looks up replicasin local hash matrix. The contrastive experiments to prevailing de-duplication techniquesDDFS and Extreme Binning shows that the Congregated Binning de-duplication techniqueperforms well and features high scalability, low memory usage and near-exact ratio ofreplicas elimination.
Keywords/Search Tags:De-duplication, Data backup, Disk bottleneck, Similarity, Locality
PDF Full Text Request
Related items