Font Size: a A A

Research On Deduplication Technology In Cloud Storage

Posted on:2020-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LiuFull Text:PDF
GTID:2428330578460901Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The continuous development of information technology changes the way of generating data constantly,causes the amount data that needs to be stored is constantly increasing.The continuous accumulation of big data brings new opportunities.Big data contains many deep values that traditional data cannot reflect.The analysis and big data mining will bring great commercial value.At the same time,big data also brings huge challenges,and the amount of big data far exceeds the processing power of traditional computing technologies.Meanwhile,massive data produces a storage method with high security,low cost,and fast processing speed named cloud storage.The study finds that there are a large amount of redundant data in both cloud storage systems and traditional data storage systems.In some systems,the data repetition rate is as high as 70% to 90%,which causes the urgency and necessity for storage system to implement a deduplication scheme.Deduplication technology can delete redundant data in the storage system to save storage space usage,save network bandwidth,and reduce data center storage costs and daily energy consumption.However,traditional deduplication technology faces enormous challenges in cloud data storage systems for big data deduplication.First,the data structure stored in cloud storage is more complex,larger,and more diverse.Second,the two conflicting goals of deduplication throughput and deduplication ratio can not be reasonably balanced.This paper aims at the above issues.The main research work of the thesis is as follows:1.By taking the HDFS as the underlying storage support structure,a cloud storage system deduplication model named HDDep is established,which can be more suitable for cloud storage systems after improving the fingerprint index structure.2.This paper introduces a data partitioning method based on file partitioning.Because redundant data between different file types is almost negligible when deduplication to reduce the range of fingerprint queries.3.A deduplication strategy called similarity clustering deduplication strategy(SCDS)is proposed.This strategy removes more duplicate data without significantly increasing system overhead.The main idea of the SCDS is to narrow the query range of fingerprints by the similarity clustering algorithm.In the data deletion,the similarity clustering algorithm is used to divide the similar data fingerprint set into same cluster.In final deduplication,only the fingerprint in one cluster needs to be detected,so as to speed up the retrieval of repeated fingerprints.Experiments show that the SCDS policy deduplication ratio is better than the existing similarity deduplication algorithm.
Keywords/Search Tags:Cloud storage, Deduplication, Chunking index, Big data
PDF Full Text Request
Related items