Font Size: a A A

Research On Routing Mechanism For Cluster Deduplication System

Posted on:2015-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y X XingFull Text:PDF
GTID:2348330509460884Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of the global amount of data, it is a big chanllenge to manage and protect data for individuals and data center. Deduplication as an efficient data reduction technique is widely used in the field of data backup. As the result of the rapidly expansion in the scale of the system and the amount of data, Cluster Deduplication System appears. However, we not only have to solve the disk bottleneck for fingerprints indexing and the shortage of personal equipment, but also need to face the storage node information island owing to the deduplication in different nodes independently. In view of the above problems and challenges, we have put forward two routing mechanism to optimize Cluster Deduplication System according to the specific application environment. In summary, the main contributions and innovations of this paper are as follows:We designed and implemented a cluster deduplication prototype system, including a backup terminal, a metadata management server and serveral cluster deduplication server nodes. Backup terminal partitions data streams into smaller parts in a static chunking strategy. We call those parts chunks and represent them by their fingerprints(e.g. SHA-1 values). Then aggregate chunks to super chunk to reduce communication overhead for routing. The metadata management server is for backup session management and store the metadata of the backup files. Cluster deduplication server nodes are responsible for fingerprints indexing and chunk storage.Then we propose AR-Dedupe which is application-aware to reduce the communication overhead in data backup process. AR-Dedupe introduces a routing server that contains of the super chunk routing table and the load information of the cluster deduplication system nodes. As a result, it can achieve a high global data reduction ratio and keep load balancing well. Morever, we establish several super chunk indexing table according to data application in order to speed up the handprints index efficiency.In the end, we put forward a history-based consistent hash routing policy for Cluster Deduplication System: HB-Dedupe. It is for personal computing devices backup system in cloud environment. HB-Dedupe identificates the hot chunk fingerprints and cache them in backup terminals. Because of the fixed buffer size, we replace fingerprints with LRU algorithm. And before we sent fingerprints requests, we firstly index in the buffer of backup terminals. The experimental results on three real datasets have shown that HB-Dedupe can decrease fingerprints indexing requests by 20%~80% with a little overhead in backup terminals.
Keywords/Search Tags:Cluster Deduplication System, Routing Mechanism, Communication Overhead, Load Balancing
PDF Full Text Request
Related items