Duplicate Data Based On Mbasedswc - Varsize Algorithm To Eliminate Technology Research

Posted on:2013-08-25

Degree:Master

Type:Thesis

Country:China

Candidate:B Cai

Full Text:PDF

GTID:2248330374485997

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the development of information technology, data has become the essential forthe development of the enterprise. In recent years, the amount of data showing the trendof explosive growth, the amount of data from many industries grows from GB or TB tothe level of PB even EB, which is particularly obvious in the types of enterprises likebanking and e-commerce, in these companies, data has been equated with wealth. Inorder to prevent data loss which would ruin the business, more and more enterprisesbegin to build their own disaster recovery system. However, if such a huge amount ofdata backuped only by simple data replicating, the backups will be a serious burden onthe storage and network. So, how to set up a disaster recovery system under the premiseof resource used optimization became a problem, the concept of data de-duplicationcame into being.The key to eliminate duplicate data is decided by the de-duplication algorithms andefficient algorithm can achieve a higher compression ratio. The main algorithm of dataeliminatimg contains the eliminating redundancy based on entire file eliminating, theeliminating redundancy based on fixed block size, and the eliminating redundancybased on content-based block. The smaller the particle size is, the better it worked, butthe greater consumption of memory and disk management.This thesis focuses on the two improved algorithm MBasedSWC andMBasedSWC-Varsize based on content-based eliminating redundant, and to which itbinds to eliminate redundant storage model FSBSM. The proposed new improvedalgorithm based on the duplication of data between file versions in the practicalapplication of most continuous data, so pre-block and sub-sub-block consolidationstrategy aims to ensure that the algorithm is good compression ratio under the premise,to solve content-based block algorithm block size is volatile. The algorithm also avoidsthe similar algorithm to use memory disk consumption in exchange for the compressionratio to achieve the purpose of the performance of the balancing algorithm. BindFSBSM eliminate redundant storage model, document similarity judgment,double-layer storage structure, and node selection strategies, data deduplication technology to maximize the value of the network cluster environment.Finally, for a combination of these theoretical studies, we designed andimplemented a prototype of storage subsystem in the disaster recovery. The system usesreal data to achieve improved algorithm for testing, compared with the third chapter ofthe simulation results, and ultimately use it to achieve backup and recovery functions ofthe prototype subsystem. The experimental results show that the performance andapplications of algorithm MBasedSWC-Varsize under the FSBSM of the model hasbeen successfully applied to achieve the desired effect in the disaster recovery storagesubsystem prototype.

Keywords/Search Tags:

Disaster recovery, Data deduplication, MBasedSWC-Varsize, FSBSM

PDF Full Text Request

Related items

1	Research Of Data Deduplication In Data Disaster Tolerance Systems
2	Improved Optimization For Data Disaster Recovery System Over Low-bandwidth Networks
3	Key Technology Of Large-scale Database Disaster Recovery System To Eliminate Duplicate Data
4	Telecom Enterprise Information Disaster Recovery Center Research And Implementation
5	Research And Realization Of Disaster Backup And Recovery Techniques For Information Systems
6	Application And Research On Disaster Recovery Backup Technology In The State Administration Of Taxation
7	Disaster Recovery System Model Research Based On Application Of Database
8	An Analysis And Improvement For Disaster Recovery Situation Of Shanghai Airlineâ€™s Information System
9	Research And Implementation Of Massive Data Disaster Recovery Based On Data Summarisation Technology
10	Data Disaster Recovery Study Of A Rail Transportation Equipment Co., LTD