Font Size: a A A

Research And Optimization On Load Balance Of Data Deduplication

Posted on:2016-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:X C ZhangFull Text:PDF
GTID:2348330536467744Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,on one hand,the capacity of the storage system needs to be increased on account of we need to store massive amounts of data safely and effectively;on the other hand,the performance of the storage system needs to be improved because of the great magnanimity of data resources put new demands of data-handling capacity.With storage capacity growing far behind the explosive growth of data,deduplication technology provides an effective means of data storage to deal with the big data environments.However the data placement cannot utilize parallelism effectively,which leads to load imbalance problem.Aiming at the problems above,we improve and optimize data placement strategy to enhance throughput and reading performance.The main contents and innovative points included the following aspects:Firstly,this paper conduct a comprehensive study of the data deduplication system.Start with the basic principles,it sets out the specific process of deduplication in detail and give several classifications of deduplication.Then it analyzes the important techniques of deduplication with existing instances,such as chunking methods,performance optimization,reliability and scalability.The improvements and optimizations to the problems and challenges will be proposed herein.Secondly,this paper pays attention to the improvements and the solution to the problems that data blocks placed cannot use the concurrency of nodes effectively,and resulting in the issues of low throughput and read performance.For distributed storage nodes,we design and implement a data placement strategy of data deduplication based on file-aware,which ensures the premise of deduplication rate,uses the principle of mutually exclusive data block,and places the data block evenly as possible and take advantage of concurrency between nodes to enhance access efficiency.What's more,we optimized the cavity problem of the file-aware data placement.After considering the validity,storage efficiency,performance and other aspects of perception strategies,we design two mechanisms to reduce cavity,one of which is the "detente".This way stores the data in the process and adjusted the strategy slowly to reduce cavity by considering the effectiveness of the file-aware strategy.Although it takes the validity into account but it thinks less about efficiency.The other mechanism is the "violence" way.It ignores file-aware strategy and fills the empty nodes directly when examining over the thresholds.This method is a more efficient one but brings more degradation to performance at the same time.Finally,we write a simulation program of the traditional B-Dedupe and the improved FA-Dedupe,then compare the deduplication rate and the performance.Experimental results shows that the strategy in the case of this design assurance system deduplication rates,at the expense of a small amount of write performance to reduce much more latency of read performance,while reducing the storage cavity to ensure the balance of space utilization.
Keywords/Search Tags:Load Balance, Data Deduplication, File-aware, Data Placement, Storage Cavity
PDF Full Text Request
Related items