Research And Implementation Of High Performance Deduplication Technology For Cloud Storage

Posted on:2023-02-27

Degree:Master

Type:Thesis

Country:China

Candidate:P Zhou

Full Text:PDF

GTID:2558306830484304

Subject:Software engineering

Abstract/Summary:

With the rapid development of information technology such as cloud computing and big data,modern society has ushered in the era of big data.Data is growing explosively,especially enterprises periodically back up data to cloud storage for data security,generating a huge amount of deduplicated data and causing huge pressure on cloud storage vendors.In order to eliminate duplicate data and reduce storage overhead,it is important to apply deduplication technology to cloud storage systems.In this thesis,from the practical requirements of cloud storage data backup,we study the key steps of data chunking and fingerprint indexing in deduplication technology.we propose Ultra FS,a small window CDC algorithm based on similarity detection and a similarity-based segment fingerprint update strategy and the corresponding Top-k-Near-p fingerprint cache prefetching method.Further,we designed and implemented a deduplication-based backup system.Specific contributions are as follows:1.For the problems of high computational overhead of CDC algorithm chunking,unstable chunk length distribution,poor resistance to boundary shift and poor ability to detect low entropy strings,this paper proposes Ultra FS CDC algorithm,Ultra FS has more comprehensive performance performance,among which the speed is 2-10 x higher than the state-of-art CDC algorithm,chunking stability is the same as Rabin,low-entropy strings detection capability is40% higher than AE-opt V2 but does not affect the chunking speed,and this thesis verifies the effectiveness of the algorithm from theoretical and experimental analysis.2.This thesis proposes a strategy for deciding whether to update the segment identifier based on similarity of segments,and also designs the Top-k-Near-p cache prefetching method by exploiting the similarity and locational locality of data segments in order to solve the disk access bottleneck problem in fingerprint indexing.Experimental results show that this method reduces the fingerprint update overhead by more than 60% and improves the indexing speed by more than 15% compared with the BLC algorithm without fragmentation.3.Developed a duplicate backup system;designed a pipelined backup workflow to fully leverage the computing capability of multi-core CPUs;and implemented a double-ended indexing optimization method to relieve additional load on the server’s fingerprint index.Finally,the deduplicate backup system is put through its paces with functional and performance tests.Furthermore,the system is tested in a Ceph cloud storage and Open Stack image management scenario,and the results show that the backup system can efficiently eliminate redundant data and meet enterprise and personal backup requirements;in the Open Stack image management system,it saves a lot of network bandwidth and shrinks the image storage space,which can save a lot of money for enterprises.

Keywords/Search Tags:

Deduplication, Chunking, Fingerprint Indexing Optimization, Backup System

Related items

1	Research On Key Technologies Of Data Deduplication For Backup System
2	Research On Data Deduplication Technology For Spark Platform Based On RDE Chunking Algorithm
3	Performance Optimization Of Data Deduplication In Backup Systems
4	Reseach On Data Deduplication Technology In Cloud Backup For Education System
5	Designing And Implementing Backup System Based On Traditional Backup Technique And Deduplication
6	Research On Deduplication Algorithm Based On Similarity And Chunking
7	Use of GPU architecture to optimize Rabin fingerprint data chunking algorithm by concurrent programmin
8	Research On Performance Optimization Based On Container Characteristics In Deduplication-based Backup Systems
9	Key Technology Of Large-scale Database Disaster Recovery System To Eliminate Duplicate Data
10	Research On High Performance Secure Deduplication Technology For Cloud Storage