Font Size: a A A

The Design And Implementation Of Data Deduplication Index Server

Posted on:2013-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:B Q SongFull Text:PDF
GTID:2248330392457790Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
As a key technology in the current storage system,data de-duplication is able tosignificantly save storage space, dramatically reduce the amount of network datatransmission. However, when faced with large amounts of data, memory is not enough tostorage all the fingerprint indexes, disk access is inevitable. How to design excellentsystem architecture to minimize disk access, improve the efficiency of the core indexingserver, increase the system throughput, balance memory usage, is becoming an importanttopic in the research of backup system.HDIS, a High-speed data De-duplication Index Server, using bloom filter and doublecache mechanism to maintain and use the locality of the data stream. In order to quicklylocate disk position in case of cache missing, we propose second hash and reverse mapmethod, establishing the mapping from fingerprint to disk position. The system cansignificantly reduce the memory usage when used with uniform sampling algorithm. Forthe purpose of eliminating unnecessary disk access and handling hash collision, we uselocal exclude strategy, which further reduces the disk access and improve systemefficiency.Experimental results show that HDIS has high efficiency(The average processingspeed exceeds more than500,000fingerprints per second) in environment with largeamount of data (TB level), while maintaining a low memory usage, and ensuring thededuplication ratio. HDIS is a highly efficient and stable enough deduplication indexserver with excellent scalability.
Keywords/Search Tags:Data Deduplication Index, Dual Cache, Second hash, Reverse Map, Sampling
PDF Full Text Request
Related items