Font Size: a A A

The Research Of The Optimization Of In-line Data De-duplication In Backup Systems

Posted on:2013-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:J JinFull Text:PDF
GTID:2248330392956879Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Studies have found that up to60%of the saved data in application systems isredundant, and this number grows as time goes by. In order to improve storage efficiency,as well as to save storage capacity and cost, data de-duplication technology came intobeing, and has become a hot topic of the researches of storage and backup systems inrecent years.Traditional in-line data de-duplication technology is facing an important issuethat when facing massive amount of data fingerprints, the index lookup bottleneck hasbecome a key factor. Most researches are basically about optimizing the indexingefficiency of index server. A common phenomenon in the real data backup applications isthat the data of backup clients gradually show a strong local similarity with the daily orweekly backups. Take advantage of this feature, the research of developing a localfingerprints index lookup function to ease the pressure of index server and reduce thedelay of index lookups, is a new way of avoiding the index lookup bottleneck.This paper proposes a source-side indexing strategy, which applies to the in-line,source-side de-duplicate backup systems, and combines the source-side indexing modewith the traditional server indexing mode. The mode works in the backup client side tosave their historical backup data’s fingerprints table, and when source-side indexing modeis selected, the file daemon will firstly search a fingerprint in the fingerprint table ofsource-side. The source-side strategy also applies the bloomfilter data structure and filesimilarity detection principle, and adds an indexing mode selection module into in-linede-duplicatio, which makes backup client daemon prior to be able to select the indexingmode according to the similarity of the backup file and the local index file.It takes fulladvantage of the feature of backup data to ease the pressure of index server, saves thebandwidth and improves backup efficiency.Finally, this paper performances some tests on the B-Cloud data backup system, andthe result shows that as the local similarity of backup data increases, the source-sideindexing mode has higher efficiency than the traditional index server indexing mode, andthe more local similar of backup data, the higher efficiency the source-side indexing modeperformances. Also, the paper tested the performance of source-side strategy in backupsystems with different sub-block sizes, and found that the smaller the sub-block size was and the greater the server indexing was, the more obvious the effect of the source-sidestrategy was.
Keywords/Search Tags:Backup System, Data De-duplication, Fingerprints indexing, Source-sideStrategy
PDF Full Text Request
Related items