Font Size: a A A

Application And Implementation Of De-duplication Technology In Cloud Storage

Posted on:2015-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:H L ZhangFull Text:PDF
GTID:2308330452457218Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of electronic information technology and Internet industry,Businesses and individuals generate a lot of data, and important data need to securelystorage.Cloud storage technology is a good solution to solve this problem. However, mostof these data is duplicate, which costs large amounts of data storage resource, as well aswastes a lot of network bandwidth.To solve these problems, we design a high-performance deduplication system to reducethe storage and network transmission of these duplicate data. We propose different chunkalgorithms for different types of documents. Especially for convention file, we improvesliding window algorithm to enhance the huge performance with losing little deduplicationrate. A distributed systems is designed to divide the fingerprint database and process thedata in parallel. This system consists of a Nameserver and multiple Dataservers. TheNameserver manager address tables of files. The Dataservers manager the fingerprintdatabase for the node and store the data. An efficient structure is proposed to accelerate thespeed of searching the address tables of files. And a high-performance fingerprint databaseis proposed to further enhance the performance of the system.We test the performance of this system. Firstly, we test the new chunk algorithm. Theresult shows that this chunk algorithm enhances30times improved performance with losing10%deduplication rate compared with the sliding chunk algorithm. Secondly, we test theperformance of the distributed systems. The result shows that when the number ofDataservers increased to four, the throughput of the system has also been a correspondingincrease3.12times. Finally, in the case of one Dataserver, testing the performance of thesystem with de-duplicating compared with the performance of the system without de-duplicating. The result shows that the system is no more than67%compared with theperformance of the system without de-duplicating.
Keywords/Search Tags:Chunk algorithms, Data de-duplication, Fingerprint database, Distributedsystems, Address tables
PDF Full Text Request
Related items