Font Size: a A A

Research And Implementation Of High Performance Deduplication Technology For Cloud Storage

Posted on:2023-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhouFull Text:PDF
GTID:2558306830484304Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology such as cloud computing and big data,modern society has ushered in the era of big data.Data is growing explosively,especially enterprises periodically back up data to cloud storage for data security,generating a huge amount of deduplicated data and causing huge pressure on cloud storage vendors.In order to eliminate duplicate data and reduce storage overhead,it is important to apply deduplication technology to cloud storage systems.In this thesis,from the practical requirements of cloud storage data backup,we study the key steps of data chunking and fingerprint indexing in deduplication technology.we propose Ultra FS,a small window CDC algorithm based on similarity detection and a similarity-based segment fingerprint update strategy and the corresponding Top-k-Near-p fingerprint cache prefetching method.Further,we designed and implemented a deduplication-based backup system.Specific contributions are as follows:1.For the problems of high computational overhead of CDC algorithm chunking,unstable chunk length distribution,poor resistance to boundary shift and poor ability to detect low entropy strings,this paper proposes Ultra FS CDC algorithm,Ultra FS has more comprehensive performance performance,among which the speed is 2-10 x higher than the state-of-art CDC algorithm,chunking stability is the same as Rabin,low-entropy strings detection capability is40% higher than AE-opt V2 but does not affect the chunking speed,and this thesis verifies the effectiveness of the algorithm from theoretical and experimental analysis.2.This thesis proposes a strategy for deciding whether to update the segment identifier based on similarity of segments,and also designs the Top-k-Near-p cache prefetching method by exploiting the similarity and locational locality of data segments in order to solve the disk access bottleneck problem in fingerprint indexing.Experimental results show that this method reduces the fingerprint update overhead by more than 60% and improves the indexing speed by more than 15% compared with the BLC algorithm without fragmentation.3.Developed a duplicate backup system;designed a pipelined backup workflow to fully leverage the computing capability of multi-core CPUs;and implemented a double-ended indexing optimization method to relieve additional load on the server’s fingerprint index.Finally,the deduplicate backup system is put through its paces with functional and performance tests.Furthermore,the system is tested in a Ceph cloud storage and Open Stack image management scenario,and the results show that the backup system can efficiently eliminate redundant data and meet enterprise and personal backup requirements;in the Open Stack image management system,it saves a lot of network bandwidth and shrinks the image storage space,which can save a lot of money for enterprises.
Keywords/Search Tags:Deduplication, Chunking, Fingerprint Indexing Optimization, Backup System
PDF Full Text Request
Related items