Font Size: a A A

Research On Deduplication Technology In Cold Data Storage System

Posted on:2021-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2518306104999889Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,the right-provisioned cold data storage system is widely used to store cold data because of its advantages of low energy consumption and low cost.Deduplication technology can reduce redundant data in the system and further reduce the cost of cold data storage system.However,the traditional deduplication technology does not have the concept of group,and global deduplication will store the chunks of a file in different groups.and the restore of the file would require the related groups to be spun up in turn to read the chunks,resulting in unacceptable read latency.Therefore,it is of great significance to reduce the cost of cold data storage system and maintain the read performance by using the appropriate deduplication method.To solve the above problems,a double deduplication method based on intra-group deduplication and inter-group deduplication is proposed.The main work is as follows:(1)Online deduplication of files in a group based on chunk-level deduplication,the file recipes and containers are stored in the recipe disk and group respectively,so that the recipes analysis does not need to spin up the group,so as to improve the performance of inter-group deduplication.(2)The same data chunks between groups were counted,and the data between groups were tested and analyzed.The results show that there are many identical or similar files in the redundant data between groups.(3)Rewriting the identical and similar files to another group,complete the inter-group deduplication in an offline way,and propose a garbage collection method suitable for Destor.(4)A prototype storage system Cold Destor based on double deduplication was designed.The system consists of three modules,Name Node,intra-group deduplication and inter-group deduplication,which realize the functions of intra-group deduplication,determining the identical/similar files between groups,rewriting the identical /similar files,garbage collection,and file restore and so on.Double deduplication can ensure that the whole file is stored in a group and does not affect the restore performance of the file.The test results show that the application of double deduplication technology in cold data storage system can save more than 39% of disk space compared with no deduplication technology.File read latency can be reduced by more than 43% compared to global de-duplication,and the read latency can be reduced by more as the number of groups increases.
Keywords/Search Tags:cold data storage system, right-provisioned, deduplication, double deduplication
PDF Full Text Request
Related items