Font Size: a A A

Design And Implementation Of Distributed Data Deduplication System Based On Chord Protocal

Posted on:2019-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:J X LiuFull Text:PDF
GTID:2428330596460060Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud storage and large data technology,more and more data from all walks of life are stored in the cloud.In the massive data stored in the cloud,a large amount of data are redundant,which causes a waste of precious storage space.In order to reduce such kind of waste,data deduplication is widely used in cloud storage.Currently,the research hot spots of data deduplication focus on two directions.One is the research of redundant data detection.The other is the research of distributed and scalable data deduplication.Now,there are a lot of valuable research achievements in these two areas.But there are still many shortcomings.For example,there are many imperfections in solving the problem of disk access bottleneck in redundant data detection,as well as the scalability,fault tolerance and load balancing in the implementation of distributed data deduplication.In order to address the problem of disk access bottleneck in redundant data detection,a scheme of combining B+ tree cluster and hash table to accelerate redundant data detection is proposed in this thesis.In addition,in order to address the problem in distributed data deduplication,a distributed data deduplication system based on Chord protocol is designed and implemented in this thesis.The main works of this thesis are as follows:1.After analyzing the shortcomings of the existing redundant data detection methods,a redundant data detection method which combines the B+ tree cluster and the hash table has been proposed.The B+ tree cluster can quickly detect data with spatial locality characteristics,and hash table can quickly detect data with no spatial locality characteristics.Thus,the method of combining B+ tree cluster and hash table can efficiently solve the problem of disk access bottleneck in redundant data detection.2.On the basis of Chord protocal,a distributed data deduplication prototype system is designed and implemented.The prototype system consists of the resource location module,the node join module,the node failure module and the load balancing module.3.The performance tests of the method of combining B+ tree cluster and hash table are conducted.Results verify the proposed method.Besides,the function and performance tests of the Chord-based distributed data deduplication prototype system are conducted.Results verify the good scalability,fault tolerance and load balancing of the prototype system.
Keywords/Search Tags:data deduplication, B+ tree cluster, hash table, Chord protocal
PDF Full Text Request
Related items