A Kind Of Hierarchical Data Deduplication Technology Research

Posted on:2014-11-26

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Wang

Full Text:PDF

GTID:2268330401965813

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid growth in the amount of storage for enterprises and individualusers, storage capacity requirements for data center are higher and higher. Statisticsshow that there are a large part is redundant in these huge amounts of data, how todetect and remove the redundant data, and improve data center storage performancehas become increasingly urgent, also very practical.The beginning of this article introduces some background knowledge and relatedtechnologies about deduplication, analyzes some major deduplicate products, on thebasis of this, we completed the following work:At first, this thesis designs a hierarchical deduplicate architecture. Control serverand information server are separated and used for transaction processing and filemetadata storing respectively. In information server, data is stored hierarchically: thefile fingerprint information is placed in memory permanently; metadata of sub-block isplaced at the solid state disks; the real file data is stored in cheap storage devices. Thusthe memory and disk spaces are rationally used and the efficiency is improved.Secondly, in the pre-processing module, data is classified, a Hash value maximumincreasing sequence partition algorithm based on byte is proposed, which solves thehard block problem in variable-length block. To solve the data collision problem in thededuplicate system; the classic SHA-1algorithm is optimized. Step function in thesha-1algorithm is improved, the extension degree of message modification isenhanced, the length of the message digest is increased, the anti-collision of sha-1algorithm is improved, and the rate of erroneous deletion is decreased.Multi-dimensional Bloom Filter algorithm is proposed, which extends the bit array ofcommon Bloom Filter algorithm, reduces the misjudgment rate, solves the problem ofrepetition judgment of huge amounts of data, and enhances the dynamic expansibilityof Bloom Filter algorithm in the distributed environment, improves the scalability ofthe whole deduplicate system. The thesis applies the hierarchical deduplicate architecture to the RFID network,and treats the RFID tag data as preprocessed metadata to hierarchically organize anddelete redundancy.Finally, extensive experimental tests are conducted. The results show that theoptimized SHA-1algorithm effectively improves the overall anti-collision;multi-dimensional Bloom Filter algorithm effectively reduces misjudgment rate andenhances the dynamic scalability; multi-level RFID deduplicate algorithm is betterthan existing algorithms in terms of time efficiency and deduplicate rate, but there area certain number of misjudgment; the desired throughput and deduplicate rate of thewhole system are reached.

Keywords/Search Tags:

Deduplicate system, Hierarchical architecture, RFID tag, SHA-1optimization, Multi-dimensional Bloom Filter algorithm

PDF Full Text Request

Related items

1	Research And Application Of Bloom Filter In Duplicated Webpages Deletion
2	Research And Application Of Data Deduplication Technology Based On Bloom Filter
3	Multi-Bloom-Filter Query Algorithms And Their Applications
4	OBF-Index:A Distributed Multi-Dimensional Index Based On Ordinal Bloom Filter
5	Privacy Preserved Bloom Filter And Key-value Based Bloom Filter
6	Researches And Applications On Efficient Bloom Filter For Big Data
7	Research On Techniques For Large-scale RFID Tag Identification, Detection And Estimation
8	Research And Implementation Based On RFID Of Warehouse Cargo Plane Positioning Algorithm
9	The Research On The Muti-keywords Search Technology Over P2P Network Based On Bloom Filter
10	The Design Of Bloom Filter Algorithm For Key-value Storage