Font Size: a A A

Study On Data Fragmentation For Data Backup Systems

Posted on:2018-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:J WenFull Text:PDF
GTID:2348330536968747Subject:Engineering
Abstract/Summary:PDF Full Text Request
With data growing aggressively,data backup system needs to store more and more backup data.In order to save storage resources,data deduplication,as a lossless data compression technology,is widely used in data backup system.The basic principle of deduplication is to find and remove the redundant chunks.However,in deduplication-based data backup systems,the removal of redundant chunks makes the logically adjacent data chunks physically scattered in different places on disks,which transforms the retrieval operations from sequential to random,thus significantly degrading the data restore performance.These restore-performance-degrading chunks are called data fragments.So far,there are a variety of defragmentation in the academic community,including capping algorithm(Capping),context-based rewrite algorithm(CBR),history-aware rewrite algorithm(HAR)and inline defragmentation for primary storage systems(iDedup).The main idea of these defragmentation is to identify and rewrite the data fragments,so that most of the logically adjacent data chunks are also adjacent in the physical storage space.However,through theoretical analysis and experimental verification,we found that these defragmentation can not accurately identify the data fragments.Because Capping,CBR,and HAR shared a common assumption that every read operation involves a large fixed-size window of contiguous chunks,which restricts the fragment identification to a fixed-size read window and leads to many false positive detections since the data fragments vary in size and can appear in different,unpredictable address locations.Besides,although the read operation of iDedup did not involve a large fixed-size window of contiguous chunks,it did not consider the disk characteristics that had a significant effect on recovery performance.In data restore,a disk operation of iDedup can only read little data.Therefore,the existing defragmentation,including Capping,CBR,HAR,iDedup,not only rewrite much data chunks but also don't improve the restore performance much.Based on these observations,we propose a more accurate fragmentation solution,called AEDefrag,which uses variable-sized and adaptively located data groups,instead of fixed-size read windows,so as to accurately identify and effectively remove data fragmentation.The basic idea is to calculate the effective data transmission bandwidth when recovering or reading a data group.If the effective data transmission bandwidth is lower than the bandwidth expected by the user,the valid data in the data group is identified as data fragments.Otherwise,the valid data is not data fragments.In our experiment,when compared to the existing defragmentation,AEDefrag is able to improve deduplication ratio by about 1% to 9% and improve restore performance by about 54% to 263%.
Keywords/Search Tags:Data Backup System, Deduplication, Data Fragmentation, Restore Performance, Defragmentation
PDF Full Text Request
Related items