Font Size: a A A

The Application And Implementation Of Deduplication Based On Identical Techniques In Storage System

Posted on:2015-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:F Y YangFull Text:PDF
GTID:2298330467463531Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Big data is currently one of the hottest topics in the field of science and technology, and data is the foundation. With the rapid development of information technology, the amount of data is doubling every two years, the trend of exponential growth widen the gap between storage demand and supply. And studies show that there are a lot of redundant data in the backup, archive, version management, E-mail and other system. Transmission and storage of the data greatly increase the enterprise cost.Deduplication can effectively reduce the cost of storage equipment and operation by detecting and eliminating duplicate data inter-and intra-files globally. Many problems is valuable currently in the research field of data deduplication, such as increasing deduplication ratio, index design, data reliability.The research and implementation of deduplication technology based on the identical data detection techniques applied to the storage system is presented in this paper. Through analysis and experiment, files is classified into different data sets depend on their types, and will be chunked using different algorithms. Experiment shows that this method improves the system performance with slight influence of the deduplication ratio. In the system of using data chunking algorithms, the tactic of using a container structure to store data blocks is referenced by most researchers. However, the data blocks with correlation will be store in different containers discretely over time, which will reduces the locality of correlative data blocks. This paper proposes an algorithm based on the fingerprints similarity between data block stream and container, and this algorithm can effectively increase the fingerprint querying locality by grouping the highly related ones. Otherwise, improving the speed of fingerprint query with the techniques of classified index, multi-level index, multithreading, bloom filter, etc. For the issue of data reliability reduction resulted from deduplication, This paper put forward an scheme of combining enterprise storage and the cloud storage that can simply and effectively improve the system reliability.
Keywords/Search Tags:storage deduplication, index techniques, datareliability
PDF Full Text Request
Related items