Study On Data Fragmentation For Data Backup Systems

Posted on:2018-06-02

Degree:Master

Type:Thesis

Country:China

Candidate:J Wen

Full Text:PDF

GTID:2348330536968747

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With data growing aggressively,data backup system needs to store more and more backup data.In order to save storage resources,data deduplication,as a lossless data compression technology,is widely used in data backup system.The basic principle of deduplication is to find and remove the redundant chunks.However,in deduplication-based data backup systems,the removal of redundant chunks makes the logically adjacent data chunks physically scattered in different places on disks,which transforms the retrieval operations from sequential to random,thus significantly degrading the data restore performance.These restore-performance-degrading chunks are called data fragments.So far,there are a variety of defragmentation in the academic community,including capping algorithm(Capping),context-based rewrite algorithm(CBR),history-aware rewrite algorithm(HAR)and inline defragmentation for primary storage systems(iDedup).The main idea of these defragmentation is to identify and rewrite the data fragments,so that most of the logically adjacent data chunks are also adjacent in the physical storage space.However,through theoretical analysis and experimental verification,we found that these defragmentation can not accurately identify the data fragments.Because Capping,CBR,and HAR shared a common assumption that every read operation involves a large fixed-size window of contiguous chunks,which restricts the fragment identification to a fixed-size read window and leads to many false positive detections since the data fragments vary in size and can appear in different,unpredictable address locations.Besides,although the read operation of iDedup did not involve a large fixed-size window of contiguous chunks,it did not consider the disk characteristics that had a significant effect on recovery performance.In data restore,a disk operation of iDedup can only read little data.Therefore,the existing defragmentation,including Capping,CBR,HAR,iDedup,not only rewrite much data chunks but also don't improve the restore performance much.Based on these observations,we propose a more accurate fragmentation solution,called AEDefrag,which uses variable-sized and adaptively located data groups,instead of fixed-size read windows,so as to accurately identify and effectively remove data fragmentation.The basic idea is to calculate the effective data transmission bandwidth when recovering or reading a data group.If the effective data transmission bandwidth is lower than the bandwidth expected by the user,the valid data in the data group is identified as data fragments.Otherwise,the valid data is not data fragments.In our experiment,when compared to the existing defragmentation,AEDefrag is able to improve deduplication ratio by about 1% to 9% and improve restore performance by about 54% to 263%.

Keywords/Search Tags:

Data Backup System, Deduplication, Data Fragmentation, Restore Performance, Defragmentation

PDF Full Text Request

Related items

1	Study On Data Deduplication Technique For Data Backup Systems
2	Research On Building Efficient Data Deduplication Storage Systems For Data Backup
3	Research On Optimization Of Space Efficiency And Restore Performance For Data Backup System
4	Research On Performance Optimization Based On Container Characteristics In Deduplication-based Backup Systems
5	A Scheme To Improve Restore Performance For Cloud Backup Systems
6	Research On Key Technologies Of Data Deduplication For Backup System
7	Design And Research On A High-performance Deduplication System
8	Performance Optimization Of Data Deduplication In Backup Systems
9	The Design And Implementation Of Disaster Backup System Data Deduplication Engine
10	Research On Duplicate Data Detection In Data Deduplication