Font Size: a A A

Research On Distributed Deduplication Technology

Posted on:2019-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y J GuoFull Text:PDF
GTID:2428330545460074Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With With the rapid development of the Internet and the Internet of Things,more and more data is being stored in cloud storage systems.However,in the process of storing these data,a large amount of redundant data is accompanied,which not only occupies a large amount of storage space,but also reduces the storage efficiency of the cloud storage system.To solve these problems,deduplication technology gives a good solution.It can effectively optimize the storage system and improve the efficiency of data transmission in the network.This paper analyzes and studies several key technologies of the deduplication system,and relevant improvements and optimizations are made to the key parts that affect the deduplication technology.The main innovations of this article include the following:(1)Aiming at the problem of low block efficiency in the traditional deduplication system,a de-blocking method(DAM)with asymmetric maximum value is proposed.The DAM algorithm uses a fixed-size window and a variable-size window to find the maximum byte as the split point.The algorithm first looks for the maximum byte value in a fixed window if all the values of the fixed window are the same as the fixed window.To be large,the value is taken as the maximum byte,and the cut point is also determined.Otherwise,the algorithm continues to move to the next byte until it finds the maximum value.(2)For the hash collision problem in the traditional data block fingerprint algorithm,the Keccak algorithm of SHA-3 is used as the fingerprint generation scheme and fingerprint matching of the data block in the repeated data block,thereby replacing the traditional SHA-2.The algorithm calculates the fingerprint value of the data block.(3)Aiming at the problem of low efficiency of deduplication in traditional deduplication systems,the improved content blocking algorithm—blocking algorithm based on asymmetric maxima and fingerprinting algorithm based on Keccak data blocks—is applied to a distributed platform.A Hadoop-based deduplication system was built and the system was optimized for performance.
Keywords/Search Tags:Deduplication, DAM, Keccak Algorithm, Distributed
PDF Full Text Request
Related items