Font Size: a A A

Duplicate Data Based On Mbasedswc - Varsize Algorithm To Eliminate Technology Research

Posted on:2013-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:B CaiFull Text:PDF
GTID:2248330374485997Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of information technology, data has become the essential forthe development of the enterprise. In recent years, the amount of data showing the trendof explosive growth, the amount of data from many industries grows from GB or TB tothe level of PB even EB, which is particularly obvious in the types of enterprises likebanking and e-commerce, in these companies, data has been equated with wealth. Inorder to prevent data loss which would ruin the business, more and more enterprisesbegin to build their own disaster recovery system. However, if such a huge amount ofdata backuped only by simple data replicating, the backups will be a serious burden onthe storage and network. So, how to set up a disaster recovery system under the premiseof resource used optimization became a problem, the concept of data de-duplicationcame into being.The key to eliminate duplicate data is decided by the de-duplication algorithms andefficient algorithm can achieve a higher compression ratio. The main algorithm of dataeliminatimg contains the eliminating redundancy based on entire file eliminating, theeliminating redundancy based on fixed block size, and the eliminating redundancybased on content-based block. The smaller the particle size is, the better it worked, butthe greater consumption of memory and disk management.This thesis focuses on the two improved algorithm MBasedSWC andMBasedSWC-Varsize based on content-based eliminating redundant, and to which itbinds to eliminate redundant storage model FSBSM. The proposed new improvedalgorithm based on the duplication of data between file versions in the practicalapplication of most continuous data, so pre-block and sub-sub-block consolidationstrategy aims to ensure that the algorithm is good compression ratio under the premise,to solve content-based block algorithm block size is volatile. The algorithm also avoidsthe similar algorithm to use memory disk consumption in exchange for the compressionratio to achieve the purpose of the performance of the balancing algorithm. BindFSBSM eliminate redundant storage model, document similarity judgment,double-layer storage structure, and node selection strategies, data deduplication technology to maximize the value of the network cluster environment.Finally, for a combination of these theoretical studies, we designed andimplemented a prototype of storage subsystem in the disaster recovery. The system usesreal data to achieve improved algorithm for testing, compared with the third chapter ofthe simulation results, and ultimately use it to achieve backup and recovery functions ofthe prototype subsystem. The experimental results show that the performance andapplications of algorithm MBasedSWC-Varsize under the FSBSM of the model hasbeen successfully applied to achieve the desired effect in the disaster recovery storagesubsystem prototype.
Keywords/Search Tags:Disaster recovery, Data deduplication, MBasedSWC-Varsize, FSBSM
PDF Full Text Request
Related items