The Research Of The Optimization Of In-line Data De-duplication In Backup Systems

Posted on:2013-04-11

Degree:Master

Type:Thesis

Country:China

Candidate:J Jin

Full Text:PDF

GTID:2248330392956879

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Studies have found that up to60%of the saved data in application systems isredundant, and this number grows as time goes by. In order to improve storage efficiency,as well as to save storage capacity and cost, data de-duplication technology came intobeing, and has become a hot topic of the researches of storage and backup systems inrecent years.Traditional in-line data de-duplication technology is facing an important issuethat when facing massive amount of data fingerprints, the index lookup bottleneck hasbecome a key factor. Most researches are basically about optimizing the indexingefficiency of index server. A common phenomenon in the real data backup applications isthat the data of backup clients gradually show a strong local similarity with the daily orweekly backups. Take advantage of this feature, the research of developing a localfingerprints index lookup function to ease the pressure of index server and reduce thedelay of index lookups, is a new way of avoiding the index lookup bottleneck.This paper proposes a source-side indexing strategy, which applies to the in-line,source-side de-duplicate backup systems, and combines the source-side indexing modewith the traditional server indexing mode. The mode works in the backup client side tosave their historical backup data’s fingerprints table, and when source-side indexing modeis selected, the file daemon will firstly search a fingerprint in the fingerprint table ofsource-side. The source-side strategy also applies the bloomfilter data structure and filesimilarity detection principle, and adds an indexing mode selection module into in-linede-duplicatio, which makes backup client daemon prior to be able to select the indexingmode according to the similarity of the backup file and the local index file.It takes fulladvantage of the feature of backup data to ease the pressure of index server, saves thebandwidth and improves backup efficiency.Finally, this paper performances some tests on the B-Cloud data backup system, andthe result shows that as the local similarity of backup data increases, the source-sideindexing mode has higher efficiency than the traditional index server indexing mode, andthe more local similar of backup data, the higher efficiency the source-side indexing modeperformances. Also, the paper tested the performance of source-side strategy in backupsystems with different sub-block sizes, and found that the smaller the sub-block size was and the greater the server indexing was, the more obvious the effect of the source-sidestrategy was.

Keywords/Search Tags:

Backup System, Data De-duplication, Fingerprints indexing, Source-sideStrategy

PDF Full Text Request

Related items

1	Research On Data De-duplication Based Real-time Backup And Recovery System
2	Research And Implementation Of The De-duplication Mechanism For The Mass Data Backup
3	Design And Implementation Of A File Backup System Based On Source De-duplication
4	Design And Implementation Of The Storage Server With Data De-duplication In Network Backup System
5	Research On Data De-duplication Technology In Network Backup
6	Sparse Indexing For File-Level De-duplication
7	Research And Implementation Of Data De-duplication Technology
8	Performance Optimization Of Data Recovery For Network Backup System Using Data De-duplication
9	Design And Implementation Of A Backup System Based On Data De-Duplication
10	Research Of Global Data De-duplication Technique In Backup System