Font Size: a A A

The Study And Improvement Of Deduplication Of Files In Cloud Storage Based On Bloom Filter

Posted on:2017-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:F N LinFull Text:PDF
GTID:2308330503468525Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays, with the prevalence of cloud storage and more understandings from people, more and more users upload file data to cloud storage to storage files, to share with other device or users or to back up files termly. It will lead to a large amount of same files on the cloud storage if without any deduplication. Deduplication of file data reduces the memory necessary to store data, and also reduces transmission bandwidth when back up file data in case of breakdown. Deduplication brings economic benefits for enterprises of cloud storage. So deduplication plays an important role in file deduplication in cloud storage.In backup system, because of the special locality and little modification of files, the files often reappear in the same of very similar sequences. However, unlike backup system, the main data source of cloud storage is the file data from personal computers. The data source has the feature of randomness. That is, you never know which file will be uploaded to the cloud storage next time.According to the features of the data source of cloud storage, a method of deduplication of files in cloud storage based on bloom filter is proposed. In the process of file chunking, each file type uses the most effective way according to the characteristics of file types. In the process of index of file chunks, the index is based on file similarity theory and a Bloom Filter is added to accelerate the speed of chunk seeking. And because different file chunking methods has different costs when it happens false positive in Bloom Filter, so a differentiated Bloom Filter is used in order to make the total costs reach the minimum. A model of hash table- differentiated Bloom Filter- index of similar files is built in the method.In experiment, the method proposed is compared with the methods based on non-differentiated Bloom Filter in common implementation, and also is compared with AA-dedupe and Extreme Binning which have the similar implement way. The results show that the method proposed improves the performance in time consumption and memory consumption with little loss in deduplication rate.
Keywords/Search Tags:cloud storage, file deduplication, differentiated Bloom Filter
PDF Full Text Request
Related items