Research On Data De-duplication Technology In Parallel Systems

Posted on:2014-03-29

Degree:Doctor

Type:Dissertation

Country:China

Candidate:R Zhu

Full Text:PDF

GTID:1228330425973321

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the development of internet, more and more applications based on the network have created massive data which has arrived several petabytes. With the attention of finding and removing the redundant data for efficient storage space usage, data de-duplication has become an important technology in the research of network storage systems. As a computation intensive and I/O intensive application, the performance of data de-duplication was most influenced by the hash chunking computation and disk index searching works when there were more and more data to be backuped. Thus, as the popular of multi-core and many-core processors, it is important to improve the performance of computation and disk-index access works in parallel environment.Recently, most researches on parallel data de-duplication were based on GPU accelerating computation and multi-threaded disk index accesses. However, as the increases of parallelism, these research methods have some performance bottleneck which influenced the scalability of the parallelism. Thus, this research has analyzed the most important factors based on corresponding performance model of the two methods. Based on the analyzing results, it has proposed two optimized solutions to alleviate the performance bottlenecks of the two methods and tested the improvements with real-world data set experiments.In the research of performance analyzing in parallel systems, it is important to consider the main aspects in the procedure of the systems for finding the performance bottlenecks. Considering the main procedure, concurrency, resource sharing and competing in the GPU accelerating and parallel disk index accessing methods, this paper has proposed two performance models which were based on the Stochastic Petri net. By computing the use ratio of the models, it has deduced the performance bottleneck of the two methods. Then, it has proposed corresponding improved methods to alleviate the performance bottlenecks and tested the methods with real-world data sets.In the GPU accelerated de-duplication method, the data transfer latency between RAM and GPU memory was the main performance bottleneck. This paper discovered that the repeat data transfer of the same data sets has produced redundant transfer latency. To remove the redundant data transfer latency, this paper has optimized procedure based on the traditional GPU accelerated method. In the experiment, this optimized method has achieved a better performance.Due to the requirement of unique data stored, the parallel disk accessing method need a synchronize mechanism to avoid the data conflict in different accessing threads. In traditional method, the lock based mechanism has incurred a heavy consistency overhead when there are more index accessing threads. To alleviate this performance bottleneck, it has proposed a DHT based index accessing mechanism which accessing different sub-index by different suffix chunks attached different threads. In the experiments, this method has achieved a better performance.

Keywords/Search Tags:

Data De-duplication, Parallel, Multi-core, Stochastic Petri Net

PDF Full Text Request

Related items

1	Research On The Use Of Colored Timed Petri Net And Stochastic Petri Net
2	Model And Algorithm Of Petri Nets Parallelization Based On Multi-core Cluster
3	Download System Research And Development Based On The Parallel Multi-core Environment
4	An Algorithm Research For Tasks Scheduling Of Parallel System On Multi-core Processor
5	Research On Task Duplication Based Multi-core Scheduling Algorithm
6	The Research And Application Of Concurrent Parallel Technology In Real-Time Monitoring System Based On Multi-Core
7	Research Of Multi-core CPU And Many-core GPU Accelerated Parallel Optimization Algorithms
8	Research And Implementation Of Data Parallel Programming Platform Based On Multi-Core
9	Design Of Visualized Analysis Software System For Stochastic Petri Net
10	Research Of Multi-core Parallel Image Segmentation Algorithm Based On Multiresoluion Image Pyramid Combined With FCM