The Design And Implementation Of Data Deduplication Index Server

Posted on:2013-03-07

Degree:Master

Type:Thesis

Country:China

Candidate:B Q Song

Full Text:PDF

GTID:2248330392457790

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

As a key technology in the current storage system，data de-duplication is able tosignificantly save storage space, dramatically reduce the amount of network datatransmission. However, when faced with large amounts of data, memory is not enough tostorage all the fingerprint indexes, disk access is inevitable. How to design excellentsystem architecture to minimize disk access, improve the efficiency of the core indexingserver, increase the system throughput, balance memory usage, is becoming an importanttopic in the research of backup system.HDIS, a High-speed data De-duplication Index Server, using bloom filter and doublecache mechanism to maintain and use the locality of the data stream. In order to quicklylocate disk position in case of cache missing, we propose second hash and reverse mapmethod, establishing the mapping from fingerprint to disk position. The system cansignificantly reduce the memory usage when used with uniform sampling algorithm. Forthe purpose of eliminating unnecessary disk access and handling hash collision, we uselocal exclude strategy, which further reduces the disk access and improve systemefficiency.Experimental results show that HDIS has high efficiency(The average processingspeed exceeds more than500,000fingerprints per second) in environment with largeamount of data (TB level), while maintaining a low memory usage, and ensuring thededuplication ratio. HDIS is a highly efficient and stable enough deduplication indexserver with excellent scalability.

Keywords/Search Tags:

Data Deduplication Index, Dual Cache, Second hash, Reverse Map, Sampling

PDF Full Text Request

Related items

1	Research On High I/O Performance Data Deduplication In Primary Storage System
2	Cache And Index Key Technology Research Based On LSM-tree
3	Research On Key Technologies Of Application-Aware Data Deduplication
4	Research On Key Technologies Of Data Deduplication For Backup System
5	Research Of Data Deduplication Technology On Hadoop Distributed System
6	Research On Data Deduplication Based On File Access Patterns
7	Design And Research On A High-performance Deduplication System
8	High Performance Data Deduplication Mechanisms For Data Centers
9	A Design Of Image-Oriented Cloud Storage Data Deduplication
10	An Efficient Data Deduplication Design with Flash-Memory Based Solid State Drive