There are a large number of redundant data and information in the backup, archiving and other centralized storage systems. So how to eliminate redundant information and save storage space become the key issues that need to be solved urgently. Recently, a new technique called data de-duplication has been attracted many people's attentions, which improves the traditional technique in data protection in the storage space and network bandwidth significantly.Based on the in-depth research data de-duplication technology, the design and implementation of the distributed data de-duplication system was finished within this paper. Firstly, initiate research on data de-duplication mechanism of mass data backup is made from three aspects which contains data de-duplication manner, storage strategy, access strategy. Secondly, based on research of data de-duplication mechanism, a solution of distributed data de-duplication is proposed, which regards a double-server model, inertia visit, Hash hierarchical access and local storage as the core strategy program. Finally, the Distributed Data De-duplication Disaster Backup System is implemented. And data de-duplication technology development is summarized and prospected by running the test results analysis techniques.Operation of the system results shows that the application of data de-duplication technology greatly eliminates the redundant data, saves the storage space and improves the operating efficiency of the backup system. |