Research On Online De-duplication Aimed At Files

Posted on:2017-03-01

Degree:Master

Type:Thesis

Country:China

Candidate:W Z Hu

Full Text:PDF

GTID:2348330503472461

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The rapid development of the Internet has led to the explosive growth of information. Nowadays, the amount of information producing by the Internet achieves the EB level every day, how to store and manage vast amounts of data is a major challenge for individuals and companies. Although there are many types of storage system and the storage capacity is also changing constantly, but store all of the information without screening is clearly not a wise choice. As an effective solution, data de-duplication technology attracts people’s attention. Currently, data de-duplication technology is widely used in backup systems, the corresponding technology is mature. But in online system the use of data de-duplication technology is infrequent for the special nature of the online system, especially for real-time requirements. The de-duplication technology needed to solve more problems.Pbfs is a middleware applied to all major file system, Pbfs used a number of special solutions based on the characteristics of inline system. Firstly, Pbfs put forward the thought of document classification, processing different types of documents via using the most suitable way. Secondly, Pbfs improved the similarity determination algorithm, the new algorithm can improve the accuracy of recognition. Finally Pbfs letted metadata class dynamic, metadata is the most important part of a de-duplication system, making metadata dynamic can improve re-rate. Pbfs purposed of these solutions is to reduce the computational overhead of the system to the maximum extent, while increasing the number of indicators to weight ratio.Test results of various data sets show Pbfs performance well compared to the ZFS de-duplication and iDedup, especially the time delay effect is obvious, which in terms of the online system is very attractive. Pbfs based ProSy constructed on the basis of similarity ProSy determination algorithms and the basic data structure has been optimized and improved, and compared with ProSy, Pbfs has a corresponding increase in the rate of weight and to read and write throughput aspect, significant performance improvements, to achieve the desired results.

Keywords/Search Tags:

Online De-duplication, File Classification, Similarity

PDF Full Text Request

Related items

1	Design And Implementation Of A File Backup System Based On Source De-duplication
2	Research On File Similarity-Based Deduplication In Network Backup
3	Design And Implementation Of The Storage Server With Data De-duplication In Network Backup System
4	Design And Implementation Of A Backup System Based On Data De-Duplication
5	Research On A File-level Data Reduplication Approach In Cloud Storage Systems
6	The Design And Implementation Of A De-duplication File System Based On Cloud Storage
7	Sparse Indexing For File-Level De-duplication
8	Research And Application On De-duplication Technology For Cloud Storage
9	Design And Implementation Of File Synchronization System
10	Key Technologies Of File Secret Classification Mark In Kernel-level