The Application And Implementation Of Deduplication Based On Identical Techniques In Storage System

Posted on:2015-10-13

Degree:Master

Type:Thesis

Country:China

Candidate:F Y Yang

Full Text:PDF

GTID:2298330467463531

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Big data is currently one of the hottest topics in the field of science and technology, and data is the foundation. With the rapid development of information technology, the amount of data is doubling every two years, the trend of exponential growth widen the gap between storage demand and supply. And studies show that there are a lot of redundant data in the backup, archive, version management, E-mail and other system. Transmission and storage of the data greatly increase the enterprise cost.Deduplication can effectively reduce the cost of storage equipment and operation by detecting and eliminating duplicate data inter-and intra-files globally. Many problems is valuable currently in the research field of data deduplication, such as increasing deduplication ratio, index design, data reliability.The research and implementation of deduplication technology based on the identical data detection techniques applied to the storage system is presented in this paper. Through analysis and experiment, files is classified into different data sets depend on their types, and will be chunked using different algorithms. Experiment shows that this method improves the system performance with slight influence of the deduplication ratio. In the system of using data chunking algorithms, the tactic of using a container structure to store data blocks is referenced by most researchers. However, the data blocks with correlation will be store in different containers discretely over time, which will reduces the locality of correlative data blocks. This paper proposes an algorithm based on the fingerprints similarity between data block stream and container, and this algorithm can effectively increase the fingerprint querying locality by grouping the highly related ones. Otherwise, improving the speed of fingerprint query with the techniques of classified index, multi-level index, multithreading, bloom filter, etc. For the issue of data reliability reduction resulted from deduplication, This paper put forward an scheme of combining enterprise storage and the cloud storage that can simply and effectively improve the system reliability.

Keywords/Search Tags:

storage deduplication, index techniques, datareliability

PDF Full Text Request

Related items

1	Research On Data Deduplication Technology In Network Storage System
2	Research On Key Techniques Of Data Deduplication In The Environment Of Big Data
3	Research On Deduplication Technology In Cloud Storage
4	Research On Key Technologies Of Resources Management In Cloud Storage System
5	Secure And Dynamic Audit Cloud Storage System With Deduplication
6	HTDRDedu:The Design And Implementation Of A Distributed Backup Data Deduplication System
7	Deduplication Research In Cloud Storage Environment
8	Research And Implementation Of Ciphertext Deduplication In Cloud Storage
9	Research On Building Efficient Data Deduplication Storage Systems For Data Backup
10	Research On High I/O Performance Data Deduplication In Primary Storage System