Font Size: a A A

Research And Implementation Of The Mass Data De-duplication System

Posted on:2012-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z K JiaFull Text:PDF
GTID:2178330332987491Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
De-duplication technology is an emerging storage technology, it is a hot pursuit of the major storage vendors and institute, which can effectively alleviate the tense situation in the storage space and can be more efficient for data storage. The main research of this paper is mass data de-duplication system, which is a new type of data storage mechanism to achieve by using the data de-duplication technology. The system can remove a lot of redundant data, reduce network bandwidth share and have great significance to save storage space and cut down storage cost.Firstly paper analyzes some mainstream data de-duplication algorithms and presents their advantages and disadvantages, then puts forward a parallel hierarchical data de-duplication algorithm PHD(Parallel- Hierarchical De-duplication) to improve the ratio of de-duplication through hierarchy de-duplication from coarse-grained to fine-grained level and solve the low rate of elimination of duplicate data by introducing the parallel processing in order to make full use of multi-core resource of the computer. Secondly according to the data storage and query technologies in data de-duplication system, two storage structures based on dynamic hashing and key-value are implemented and three-level query policy based on Bloom Filter filter, memory check and disk check is proposed to improve the efficiency of the system. Finally paper completes the mass data de-duplication system by using PHD de-duplication algorithm, storage structure based on dynamic hashing and three-level query policy after constructing the lager concurrent communication platform and tests the whole system form both functionality and performance.The experimental results show that the mass data de-duplication system realized in the paper not only has a high ratio of de-duplication and has a better rate of elimination of duplicate data.
Keywords/Search Tags:de-duplication, mass data, hierarchy, parallel processing
PDF Full Text Request
Related items