Font Size: a A A

Research On Performance Optimization Of Virtual Machine Image Deduplication For Cloud Data Center

Posted on:2018-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiangFull Text:PDF
GTID:2348330542983640Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Sustained growth in the size of cloud data centers and rapid expansion of the number of virtual machine images.This poses a severe challenge to the storage management of cloud data centers.In addition,the vast amount of virtual machine image has a high proportion of repeated data,which consumes a lot of storage resources and occupies a large amount of I/O overhead,greatly affecting the performance of cloud data centers.Although some achievements have been achieved in the virtual machine image cancellation method,there are still many challenges,which are not applicable to the massive mirror backup scenarios with timeliness requirements.Virtual machines have large image files,high similarity between them,and are not easy to modify.For the massive virtual machine image to re backup scenes,this thesis designed the traditional image deduplication method based on preprocessing performance optimization steps.Through above optimization method,not only to ensure the weight removal rate,and to shorten its weight removal time.First of all,because a large number of virtual machine image backups exist backup time limit and traditional deduplication technology has index performance bottleneck in cloud computing data center.So this theses proposes an improved duplicate data deletion optimization method based on image similarity(ISGA).This method focuses on the backup scenes of mass virtual machine images.It start with the goal of improving the overall performance of the system,and it is based on the guarantee of deduplication rate.It reduces sampling range by means of mirror random sampling on the basis of fully ensuring the deduplication rate.The advantage is that it reduces the range and frequency of fingerprint index lookup comparisons.Then,the similarity between the mirror images is used to calculate the fingerprint of the sample image block,and the similarity degree is determined by the same number of fingerprints,and the mirror image whose similarity reaches the threshold is classified into a set.The method eventually forms several similar mirror image groups.Finally,deep deduplication is performed within the mirror image group.After the above pretreatment,the index range of deduplication process is reduced,the time of deduplication is reduced,and the backup efficiency is improved.The above method is especially suitable for the fast backup of large number of virtual machine images.Secondly,the system may reduce deduplication rate through the above pretreatment.However,the backup deduplication system is still limited by disk bottlenecks.After the pretreatment,this theses introduced an improved algorithm based on bloom filter image before the traditional fingerprint retrieval.The fixed block data partitioning method,using the Hash MD5 and SHA-1 Hash algorithm as fingerprint data block;Hash fingerprint using two layers of Bloom filter data block fast judging and filtering.Through the above operations,it can reduce the fingerprint access,memory operation,and solve the problem of hard disk performance bottlenecks,so as to maximize the rate of repeated data deletion.In summary,the image similarity and the index of bottleneck and solve the bottleneck problem of disk image backup to process,to speed up the index at the same time,but also ensure to rate,improve overall performance to the system image backup.The experimental results show that compared with the virtual machine image clustering,this thesis proposes the optimization method has the following advantages:in the image number under the same conditions,the proposed method needs to shorter heavy backup time;as the image type under the condition of increasing the total image needed to time shorter.The experimental results show that the proposed method is more suitable for the cloud data center scene with timeliness,large number of virtual machines,and variety of virtual machines.
Keywords/Search Tags:Virtual machine, Image, Deduplication, Similarity, Bloom filter
PDF Full Text Request
Related items