Research On Deduplication Technology In Cloud Storage

Posted on:2020-03-28

Degree:Master

Type:Thesis

Country:China

Candidate:Z H Liu

Full Text:PDF

GTID:2428330578460901

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The continuous development of information technology changes the way of generating data constantly,causes the amount data that needs to be stored is constantly increasing.The continuous accumulation of big data brings new opportunities.Big data contains many deep values that traditional data cannot reflect.The analysis and big data mining will bring great commercial value.At the same time,big data also brings huge challenges,and the amount of big data far exceeds the processing power of traditional computing technologies.Meanwhile,massive data produces a storage method with high security,low cost,and fast processing speed named cloud storage.The study finds that there are a large amount of redundant data in both cloud storage systems and traditional data storage systems.In some systems,the data repetition rate is as high as 70% to 90%,which causes the urgency and necessity for storage system to implement a deduplication scheme.Deduplication technology can delete redundant data in the storage system to save storage space usage,save network bandwidth,and reduce data center storage costs and daily energy consumption.However,traditional deduplication technology faces enormous challenges in cloud data storage systems for big data deduplication.First,the data structure stored in cloud storage is more complex,larger,and more diverse.Second,the two conflicting goals of deduplication throughput and deduplication ratio can not be reasonably balanced.This paper aims at the above issues.The main research work of the thesis is as follows:1.By taking the HDFS as the underlying storage support structure,a cloud storage system deduplication model named HDDep is established,which can be more suitable for cloud storage systems after improving the fingerprint index structure.2.This paper introduces a data partitioning method based on file partitioning.Because redundant data between different file types is almost negligible when deduplication to reduce the range of fingerprint queries.3.A deduplication strategy called similarity clustering deduplication strategy(SCDS)is proposed.This strategy removes more duplicate data without significantly increasing system overhead.The main idea of the SCDS is to narrow the query range of fingerprints by the similarity clustering algorithm.In the data deletion,the similarity clustering algorithm is used to divide the similar data fingerprint set into same cluster.In final deduplication,only the fingerprint in one cluster needs to be detected,so as to speed up the retrieval of repeated fingerprints.Experiments show that the SCDS policy deduplication ratio is better than the existing similarity deduplication algorithm.

Keywords/Search Tags:

Cloud storage, Deduplication, Chunking index, Big data

PDF Full Text Request

Related items

1	Research On High Performance Secure Deduplication Technology For Cloud Storage
2	Research On Key Techniques Of Data Deduplication In The Environment Of Big Data
3	Research On Similarity-based Secure Data Deduplication In Cloud Computing
4	Research On Key Technologies Of Resources Management In Cloud Storage System
5	Secure And Dynamic Audit Cloud Storage System With Deduplication
6	Research On Deduplication Algorithm Based On Similarity And Chunking
7	Research On Key Technologies Of Data Storage Management Oriented Continuous Data Protection
8	A Lightweight Delta Synchronization Approach For Cloud Storage Services Inspired By Data Deduplication
9	Research On Safe Deduplication Method In Cloud Storage Environment
10	Research On Cloud Storage Ciphertext Security Deduplication Technology