Font Size: a A A

Research And Implementation Of Data De-duplication Technology In Virtual Tape Library

Posted on:2014-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:L HeFull Text:PDF
GTID:2248330398475005Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the global informatization, our society is evolving into an information society. The governments and all walks of life depend on the information resources, information technology and information industry more and more strongly, and their requirements of the storage space is growing fast. When to backup, there will be a large number of identical data and documents can be found in the data backup, and these data and documents take up a lot of expensive disk space.VTL (Virtual Tape Library) is widely used in the data storage of governments and all walks of life for its high performance, low failure rate, high reliability and other more advantages. Therefore, it is imperative to research the de-duplicate technology that can delete duplicate data in VTL.Firstly, this thesis finds out the problems and shortcomings of exiting data de-duplication technology through the analysis of current situation of VTL and de-duplication technology in home and abroad and establishes the starting point of this study. Secondly, it studies the basic principle of data de-duplication, and then implements a data de-duplication system based on block level through the following processes:divides files into blocks、calculates the hash value of blocks、looks up the hash value and saves the hash value. In order to solve the "hash conflict" problem of hash algorithm in data de-duplication technology and raises the security of data, it uses the zipper method to optimize the hash algorithm; In order to improve the efficiency of detecting duplicate data block, it improves the detection algorithm based on content; in order to improve the efficiency of hash table lookup, it optimizes and improves the hash table through Bloom Filter technology.Finally, the system is tested and analyzed in the environment of VTL and backup software. From the test results, the improved CDC data detection algorithm has higher de-duplicate rate than FSP、SB algorithm and the general data compression software.
Keywords/Search Tags:VTL, De-duplicate, Hash Algorithm, Data Detection Algorithm, Bloom Filter
PDF Full Text Request
Related items