Research On Routing Mechanism For Cluster Deduplication System

Posted on:2015-12-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Xing

Full Text:PDF

GTID:2348330509460884

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the explosive growth of the global amount of data, it is a big chanllenge to manage and protect data for individuals and data center. Deduplication as an efficient data reduction technique is widely used in the field of data backup. As the result of the rapidly expansion in the scale of the system and the amount of data, Cluster Deduplication System appears. However, we not only have to solve the disk bottleneck for fingerprints indexing and the shortage of personal equipment, but also need to face the storage node information island owing to the deduplication in different nodes independently. In view of the above problems and challenges, we have put forward two routing mechanism to optimize Cluster Deduplication System according to the specific application environment. In summary, the main contributions and innovations of this paper are as follows:We designed and implemented a cluster deduplication prototype system, including a backup terminal, a metadata management server and serveral cluster deduplication server nodes. Backup terminal partitions data streams into smaller parts in a static chunking strategy. We call those parts chunks and represent them by their fingerprints(e.g. SHA-1 values). Then aggregate chunks to super chunk to reduce communication overhead for routing. The metadata management server is for backup session management and store the metadata of the backup files. Cluster deduplication server nodes are responsible for fingerprints indexing and chunk storage.Then we propose AR-Dedupe which is application-aware to reduce the communication overhead in data backup process. AR-Dedupe introduces a routing server that contains of the super chunk routing table and the load information of the cluster deduplication system nodes. As a result, it can achieve a high global data reduction ratio and keep load balancing well. Morever, we establish several super chunk indexing table according to data application in order to speed up the handprints index efficiency.In the end, we put forward a history-based consistent hash routing policy for Cluster Deduplication System: HB-Dedupe. It is for personal computing devices backup system in cloud environment. HB-Dedupe identificates the hot chunk fingerprints and cache them in backup terminals. Because of the fixed buffer size, we replace fingerprints with LRU algorithm. And before we sent fingerprints requests, we firstly index in the buffer of backup terminals. The experimental results on three real datasets have shown that HB-Dedupe can decrease fingerprints indexing requests by 20%~80% with a little overhead in backup terminals.

Keywords/Search Tags:

Cluster Deduplication System, Routing Mechanism, Communication Overhead, Load Balancing

PDF Full Text Request

Related items

1	Multi-goal Load Balancing Research In Virtual Environment
2	Research On Routing Protocol Based On Adaptive Cluster-head For Load-balancing Tree In WSN
3	Design And Implementation Of Web Cluster Load Balancing System
4	Study Of Load Balancing Algorithm Based On Process Migration Mechanism In Heterogeneous Cluster Environment
5	The Application Of Linux Cluster System Based On The Load Balancing Algorithm In Webgis
6	Research And Implementation Of Temporally Ordered Routing Algorithm Routing Protocol
7	Research On Trusted Opportunistic Routing Mechanism For Load Balancing In MANET Based On Trust Model
8	Research On Load Balancing Algorithm Based On Multi-next Hop Routing Mechanism
9	Research On Load Balancing Algorithm Based On Multi-Next Hop Routing Mechanism
10	Overhead And Load Balance Of Routing Protocols In Mobile Ad Hoc Network