Research On Parallel Data Redundancy Elimination Strategy In Cloud Storage With MPI And Four-stage Pipelines

Posted on:2021-05-01

Degree:Master

Type:Thesis

Country:China

Candidate:B S Zhu

Full Text:PDF

GTID:2428330605972992

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The cloud storage is an effective and critical technology for data storage in the context of the era of big data at this stage.However,since many files or data blocks are stored in the cloud server,the cloud server resource may be wasted when the same file or data block is repeatedly stored,so the technology of data redundancy elimination should be applied.Nevertheless,as the result of the problem that the time of eliminating redundancy is too long as well as the cloud server resource could not be fully utilized in the data deduplication process ing,the speed of data redundancy elimination processing needs to be accelerated,and the resource utilization rate of the cloud server ought to be improved.This paper proposes a parallel data redundancy elimination strategy with message passing interface and four-stage pipelines in cloud storage,in which the client uses the four-stage pipelines parallel data partition strategy to accelerate the partition processing of data(including reading file,data partition,data compression,fingerprint calculation);after the master node receives plenty of data blocks and block metadata in the cloud server,the received block met adata is evenly distributed to each slave node through MPI,which performing data deduplication on the global bloom filter matrix.When a false positive misjudgment occurs due to the hash collision,the process of eliminating redundancy is performed on the secondary indexing structure.Thus,the parallel data redundancy elimination processing is performed on multiple slave nodes,and the data block without redundancy is stored on the corresponding slave node,thereby completing the data redundancy elimination processing in the cloud storage environment.The experiments consist of three parts,the client using the four-stage pipelines parallel data blocking strategy to partition the file(including reading file,data partition,data compression,fingerprint calculation);the Virtual Machine and MPI are used to build the parallel computing environment including many nodes,one node is the master and others are slaves on the side of cloud server.The operating system Cent OS 7 is picked up to verify the time of data redundancy elimination and the time of the whole processing when the number of slave node is 4,8,16,24 as well as 32 individually aiming at 2.19 MB size file and 300MB;the increasing indexing performance verification of the secondary indexing structure applied by servers in cloud when retrieving data.It is turned out that on the client side,the parallel partition strategy using fourstage pipelines can greatly improve the data partition speed compared with the single-core processor.The cloud server adopts MPI based parallel data redundancy elimination strategy.Compared with the strategy of sending each data block to any slave node for data deduplication processing,the time for data redundancy elimination processing can be extremely decreased.And the more data blocks are,the more obvious this trend is.The data retrieval time can be decreased to further improve the indexing performance when the secondary indexing structure is used by servers in cloud compared with the linked hash indexing struct ure.The more the size of the file is,the more improved the performance of the data retrieval is.

Keywords/Search Tags:

MPI, four-stage pipelines, cloud storage, data redundancy elimination, parallel computing

PDF Full Text Request

Related items

1	Research On Reliability And Security Of Data Storage Technology In Cloud Computing
2	Research On Data Distribution Strategies For Cloud Storage Based On Data Redundancy
3	TailoredRE: A personalized cloud-based traffic redundancy elimination for smartphones
4	Research And Implementation Of TCP Acceleration Technology Under Cloud Service
5	General Cloud-native Big Data Architecture With Kubernetes
6	Research And Implemetation Of Data Redundancy Elimination Technology For Wide Area Network
7	Study On DHT Based Open P2P Cloud Storage Services Systems
8	Research On High Performance Redundancy Elimination Techniques For Data Backup Systems
9	Research On The Optimization Method Of Cloud Storage Security Merging Reputation Evaluation
10	Research On Secure And Efficient Cloud Storage Based On Fog Computing Schema