Font Size: a A A

A Kind Of Hierarchical Data Deduplication Technology Research

Posted on:2014-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WangFull Text:PDF
GTID:2268330401965813Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid growth in the amount of storage for enterprises and individualusers, storage capacity requirements for data center are higher and higher. Statisticsshow that there are a large part is redundant in these huge amounts of data, how todetect and remove the redundant data, and improve data center storage performancehas become increasingly urgent, also very practical.The beginning of this article introduces some background knowledge and relatedtechnologies about deduplication, analyzes some major deduplicate products, on thebasis of this, we completed the following work:At first, this thesis designs a hierarchical deduplicate architecture. Control serverand information server are separated and used for transaction processing and filemetadata storing respectively. In information server, data is stored hierarchically: thefile fingerprint information is placed in memory permanently; metadata of sub-block isplaced at the solid state disks; the real file data is stored in cheap storage devices. Thusthe memory and disk spaces are rationally used and the efficiency is improved.Secondly, in the pre-processing module, data is classified, a Hash value maximumincreasing sequence partition algorithm based on byte is proposed, which solves thehard block problem in variable-length block. To solve the data collision problem in thededuplicate system; the classic SHA-1algorithm is optimized. Step function in thesha-1algorithm is improved, the extension degree of message modification isenhanced, the length of the message digest is increased, the anti-collision of sha-1algorithm is improved, and the rate of erroneous deletion is decreased.Multi-dimensional Bloom Filter algorithm is proposed, which extends the bit array ofcommon Bloom Filter algorithm, reduces the misjudgment rate, solves the problem ofrepetition judgment of huge amounts of data, and enhances the dynamic expansibilityof Bloom Filter algorithm in the distributed environment, improves the scalability ofthe whole deduplicate system. The thesis applies the hierarchical deduplicate architecture to the RFID network,and treats the RFID tag data as preprocessed metadata to hierarchically organize anddelete redundancy.Finally, extensive experimental tests are conducted. The results show that theoptimized SHA-1algorithm effectively improves the overall anti-collision;multi-dimensional Bloom Filter algorithm effectively reduces misjudgment rate andenhances the dynamic scalability; multi-level RFID deduplicate algorithm is betterthan existing algorithms in terms of time efficiency and deduplicate rate, but there area certain number of misjudgment; the desired throughput and deduplicate rate of thewhole system are reached.
Keywords/Search Tags:Deduplicate system, Hierarchical architecture, RFID tag, SHA-1optimization, Multi-dimensional Bloom Filter algorithm
PDF Full Text Request
Related items