Font Size: a A A

Research And Implementation Of Computation Performance Optimization Scheme For Data Deduplication System

Posted on:2018-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:X F YanFull Text:PDF
GTID:2348330566951630Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of big data,there are a large number of redundant data existing in the storage system.The data deduplication technology can effectively eliminates redundant data,which gain wide attention in academia and industry.However,the data deduplication has a significant impact on the performance of the storage system because of the large computation overhead.To reduce the impact of the computation overhead caused by data deduplication,our manuscript analyzes the data chunking and fingerprint calculating overhead of data deduplication,and explores the shortages of the traditional computation performance optimization schemes that use multicore processor to accelerate the heavy calculation.Although the content-based data chunking is accelerated by parallelization,it destroys the content-based attributes,and results in the reduction of the deduplication ratio.To solve the above problems,the parallel data chunking with joint scheme is proposed,which joint after parallel data chunking.Based on the scheme of data chunking with joint,our manuscript designs and implements G-Dedup,a data deduplication system which uses GPU parallel computing to accelerate the Rabin data chunking and the SHA-1fingerprint calculating.According to the hardware characteristics of GPU,G-Dedup optimizes the implementation of Rabin data chunking and SHA-1 fingerprint calculation.The results of Rabin data chunking are processed by balancing workload scheme,so as to optimize the calculation efficiency of the GPU in the fingerprint calculating stage.Moreover,our manuscript design and implement the pipeline strategy,so as to alleviate the temporal overhead of serial execution,and improve the throughput of G-Dedup.The experimental results show that the G-Dedup effectively implements the parallel data chunking and fingerprint calculating.The average throughput of G-Dedup is2.01GB/s,and the parallel data chunking with joint scheme decreases the deduplication ratio to 0.1%~1.5%.
Keywords/Search Tags:Data deduplication, Computation overhead, Parallel computing, GPU
PDF Full Text Request
Related items