Font Size: a A A

Research On Restore-Aware Similar Data Reduction Technique

Posted on:2024-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:R X XiaFull Text:PDF
GTID:2558307100989119Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the development of internet technology,global data volume is growing exponentially.Research indicates that as of 2022,the global information total has exceeded 80 ZB and is projected to double every two years in the future.There is a significant amount of redundant data in storage systems,particularly in backup storage systems.Backup systems typically employ data deduplication to eliminate completely duplicate data blocks.However,data deduplication can result in continuous fragmentation of data blocks,leading to decreased recovery performance.In certain application scenarios,recovery performance takes priority over compression ratio.Additionally,data deduplication can only eliminate completely identical data blocks,while similar data blocks still contain a large amount of redundancy.Delta compression can eliminate such redundancy,but the additional disk read operations required for reading reference data blocks can reduce backup throughput.Observations show that when data block fragmentation is not severe,delta compression can simultaneously improve compression ratio and recovery performance with minimal impact on backup throughput.Based on this observation,this paper proposes Recovery-aware Similar Data Reduction Technique(RDC)to enhance both storage efficiency and recovery performance in backup systems.RDC comprises three strategies: recovery performance evaluation,sequential deduplication,and local redundancy elimination.Since data block fragmentation directly results in decreased recovery performance,recovery performance can also serve as an evaluation criterion for data block fragmentation.Recovery performance evaluation utilizes the position information of data blocks during backup to estimate recovery performance.When recovery performance falls below a threshold,indicating significant previous data block fragmentation,RDC adopts the sequential deduplication strategy to transform subsequent data blocks into sequential storage and improve recovery performance.Sequential deduplication detects data blocks that are still stored sequentially in the system and references only these data blocks,thereby breaking the association between the current backup and previously fragmented data blocks.When recovery performance exceeds a threshold,RDC employs the local redundancy elimination strategy to simultaneously improve compression ratio and recovery performance.Local redundancy elimination utilizes both data deduplication and delta compression for redundancy elimination while avoiding association with previously fragmented data prior to the last sequential deduplication,thereby mitigating the decline in recovery performance.Two actual datasets were tested in this study.The experiment showed that compared to using only data deduplication in a backup system,RDC increased compression rate by 8.2 to 14.5 times.Compared to using data deduplication +rewrite algorithm in a backup system,RDC increased recovery performance by 30%to 49%.
Keywords/Search Tags:backup system, delta compression, data deduplication, recovery performance
PDF Full Text Request
Related items