Research On A File-level Data Reduplication Approach In Cloud Storage Systems

Posted on:2020-06-14

Degree:Master

Type:Thesis

Country:China

Candidate:Q F Huang

Full Text:PDF

GTID:2428330590451157

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

According to statistics,there are about 30 to 60 percent of the duplicate data in the global cloud storage system,and up to 70 percent for ordinary users.However,duplicate data processing technology is mostly used in the field of backup,and Research on duplicate file processing before file upload is rare.Detailed design of online file redesign will undoubtedly bring great improvement to the performance of file system.Aiming at the file-level data detection technology based on file system layer in cloud storage system,this paper adopts a method of file duplication removal based on client-server division of labor,which includes two aspects: first,a method of file pre-screening based on Bloom filter is proposed,and secondly,a PIA algorithm is proposed for file incremental segmented summary calculation.Finally,based on the above methods,this paper designs the whole de-duplication system.Firstly,the pre-screening process is needed after uploading files.After comparing the objective attributes of files in Bloom filter and table partition,the non-existent files in the system will be uploaded directly without participating in the subsequent calculation.Secondly,for the files that may exist in the system,the PIA algorithm is compared in detail.After uploading the files,the unfinished work of the client will be continued by the server.The core idea of the whole process is to upload files that do not exist in the client layer by layer filtering system to the server,so that they do not participate in the subsequent calculation of the client,so as to improve the resource utilization of the server and reduce the cost of the client.Finally,the experiment is carried out in FastDFS distributed file system,and the PIA algorithm proposed in this paper is compared with the full-file digest algorithm for removing FastDHT.The experimental results show that PIA algorithm can judge and process duplicate files quickly without reducing the deletion rate of duplicate files,which greatly reduces the burden of computing resources.The data show that under the optimal condition,the algorithm filters out non-duplicate files with 2 ms CPU occupancy of 0.39% and memory occupancy growth of no more than 0.1 GB,and in the worst case,the algorithm is the same as the full-file summarization algorithm.

Keywords/Search Tags:

File De-duplication, FastDFS Distributed File System, Bloom Filter, Digest Algorithm

PDF Full Text Request

Related items

1	Design And Implementation Of A Secure File System Based On FastDFS
2	Design And Implementation Of A Lightweight Distributed File System Based On FastDFS
3	Research And Implementation Of The Directory File System That Based On FastDFS
4	Research And Implementation Of File Management System Based On FastDFS In Cloud Platform
5	Small Files-oriented Distributed File Storage Management System
6	The Study On The Improvement Of The Load Balance Algorithm For FastDFS Distributed File System
7	Research And Application Of Distributed File System Fastdfs
8	Research And Implement Of FastDFS-based Efficient Access Method For Massive Small Files
9	Design And Implementation Of A Backup System Based On Data De-Duplication
10	Research And Application Of De-duplication Algorithm Based On Double Bloom Filter