Font Size: a A A

Research On Virtual Machine Image File Deduplication

Posted on:2018-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:E G M T S B E TuFull Text:PDF
GTID:2348330512475623Subject:Information security
Abstract/Summary:PDF Full Text Request
With the development of the virtual machine technology and virtual computing environment,as the loading of the storage and transmission,which present a high convenience for cloud computing.However,with the increase in the number of"one-off" virtual machines created by users,the number of virtual machine image files on the cloud platform also skyrocketed,redundant data for cloud computing providers posed a huge challenge,so the virtual machine image file deduplication is necessary.There are some shortcomings in the selection and division of deduplication granularity in the deduplication of the same virtual machine image files,that is Hash in the file-level data deduplication without considering the similarities between virtual machine image files in addition,for similar virtual machine image file is also a blank.In order to solve this problem,this paper proposes to design and implement a hierarchical deduplication scheme based on SimHash,which can solve the similar virtual machine image file data deduplication.The main contents are as follows:(1)The similarity between the virtual machine image file and the image file format and the virtual machine image file is analyzed.The results show that the format of the virtual machine image file is closely related to the data redundancy,and there is more than one image file between the same format 60%of the similar data to prove that the study on the same image file and the similarity image files deduplication,is necessity.(2)A hierarchical data de-duplication scheme for virtual machine image files based on SimHash algorithm is designed and implemented.The scheme divides the image file into several data blocks based on fixed size block technology.Using the improved SimHash function to calculate its SimHash value and as a unique identification,pre-transmission SimHashID to reduce network transmission overhead,the file similarity comparison to achieve graded deduplication,the first level of the file as the object,the second level of data block as object.(3)Test the implementation of the program.The deduplication rate,the deduplication accuracy,the feasibility and the stability are tested,and compared with the original data deduplication scheme.The experimental results show the feasibility of this scheme,and there are certain advantages in the weighting rate and the deduplication accuracy,which can save nearly 60%of the storage space,but there are some deficiencies in the stability,need to further study.
Keywords/Search Tags:Deduplication, Virtual Machine Image Files, SimHash, Hash
PDF Full Text Request
Related items