Font Size: a A A

The Design And Implementation Of Disaster Backup System Data Deduplication Engine

Posted on:2015-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2298330452961282Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Along with the development of the information age, data is becoming the coreof human society,and the data security problem is paid also gradually attention, sodisaster recovery solutions of the backup data have been proposed for dataprotection.But with the advent of the age of big data,the growth of the data is too fierce,that lead to disaster recovery plan had to add more physical storage devices andhigher network bandwidth to the data storage protection.Expensive storage devicesand high maintenance cost result in operational andcost greatly increasing.Among these huge data, there are large amount of duplication and redundancyin the fine-grained level, which do not require transmission or protection.So,it is an excellent way to solve the excessive growth of data and to reducethe cost that to design a system engine to eliminate redundant data for the disasterrecovery system without affecting the disaster recovery, so the development teamproposed the requirement to develop the Data De-duplication.Standing at the perspective of the development storage industry,this paperanalysed the trends and research status of repeated data deleting area in the currentforeign advanced storage manufacturer, and proposed the design of the DataDe-duplication system.This article divided the Data De-duplication into client and server as twoparts,and introduced the algorithm of cutting the coarse-grained disaster data intothe fine-grained data, the improved LRU caching algorithm of accelerating theclient repeating judgment,the double caching acceleration technology of reducingthe reading and writing of server disk I/O and maintaining the Bloom filter andlocal cache,and the technology of fast I/O query in double index in details.The engine designed and developed by using C++language and C/S structureas a model.Among these technologies, fingerprint calculation, data fragmentationand local fingerprint cache are all the focus to improve the performance of thededuplication deleting client, because the fingerprint can guarantee the uniquenessof data,different slice algorithm will lead to different deduplication rate,and thelocal cache can accelerate the speed of repeating judgment.The server of Data De-duplication is the only body of storage repeatingjudgment.The Bloom filter and the technology of maintaining the local cacheachieved the double-caching method to accelerate repeating judgment.Based on thedata locality principle,a double index of quick data query is established. At the same time,this article optimized the threading model of the client andthe server,so that the block and wait caused by access the same memory betweenthreads are avoided,and then the performance of the Data De-duplication isimproved.Finally,this article introduced of SSD to solve the bottleneck of disk I/O indeduplication technique.It is tested that the Data De-duplication has excellentperformance in mass data in disaster recovery.It can not only ensure proper backupand recovery,but also save physical storage space,reduce the occupied bandwidth,reduce the cost,and accelerate the overall speed of disaster recovery.
Keywords/Search Tags:De-duplication engine, Backup, Data fragmentation, Double cache, Double index
PDF Full Text Request
Related items