Font Size: a A A

Design And Implementation Of Similar Data Deduplication System In Cloud Environment

Posted on:2021-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:F LiuFull Text:PDF
GTID:2518306050466604Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud storage service in the cloud environment,more and more individual or enterprise users choose to store data on the cloud server.In this way,the cloud server will accumulate more and more redundant data,which seriously wastes the network bandwidth and storage resources.Therefore,data deduplication technology emerges as the times require.By using this technology,only one copy of multiple duplicate files stored in the cloud server will be kept.However,this kind of data deduplication method requires users to upload plaintext data of files,and the storage of plaintext data in the cloud server often brings great security risks.For example,the mining and analysis of user data can provide important financial opportunities and decision-making information for cloud service providers themselves.Driven by this interest,cloud service providers may become "dishonest",and user's data will also have the risk of leakage.Therefore,in order to protect the privacy of user's sensitive data,data encryption is necessary before file be uploaded.However,in the encrypted operation before the file be uploaded,the user's choice of file key is full of randomness,different users will use different file key,which will lead to the same file produce totally different ciphertext.For this reason,researchers put forward a scheme of ciphertext deduplication based on cryptographic hash value.In this scheme,when the user uploads the file,he needs to upload the hash value of the file first.The same files have same hash value,and the different files have different hash values,so that the repeatability of the file can be judged.However,for a large number of redundant multimedia files in the cloud environment,especially image files,after geometric distortion in the form of clipping,rotation or scaling,only changes the binary display of the files,and does not change people's visual perception.In this case,the deduplication scheme based on cryptographic hash value is not suitable to apply,because such files should be deleted as duplicate copies of images.Therefore,this thesis focuses on the security of image ciphertext deduplication in cloud environment.Combined with the advanced encryption standard algorithm and elliptic curve cryptography algorithm,aiming at the problem of low deduplication rate of file level and block level deduplication technology which is widely used at present,this thesis puts forward a safe deduplication scheme based on data similarity,which can be extended to other types of files.The specific work of this thesis is as follows:(1)File feature extraction.Extracting the file eigenvalue for data similarity judge,and different types of files use different eigenvalue extraction methods.The perceptual hashing value is extracted from image files,and the fuzzy hash value is extracted from other files.Especially,aiming at the existing image deduplication schemes,which focus on reducing the differences between the perceptual hashing values of similar images and less on increasing the differences between the perceptual hashing values of different images,this thesis proposes and implements two difference based image perceptual hashing algorithms,namely Seq p Hash and Laplace p Hash,and tests them on small-scale similar image sets and large-scale different image sets respectively.Comparing with the existing image perceptual hashing algorithm,the proposed image perceptual hashing algorithm embodies the partial correlation between image pixels,increases the difference between the perceptual hashing values of different images,significantly improves the probability of distinguishing different images.(2)Research on security technology of data sharing.In this thesis,the data sharing security scheme focuses on the legal authorization of file access and the confidentiality protection of internal data in enterprise.Therefore,the advanced encryption standard algorithm is adopted,and the main symmetric key is introduced to further confuse the key of the traditional convergent encryption scheme.The generated result is used as the file convergent key in this scheme.In this way,it can prevent the traditional convergent encryption scheme from being vulnerable to violence attack when the entropy value of the file is small.Then,when users need to upload files,view file directories,download files or share files,the challenge-response and digital signature are used to complete the entity authentication between the clients or between the client and the cloud server.Next,in the file sharing,the transmission of file symmetric key is involved.The elliptic curve cryptosystem is used to generate temporary public-private key pair to protect the transmission of file symmetric key.In particular,in the protocols involved in the data sharing security scheme,the one-time random number is generally selected as the freshness flag to ensure the freshness of the message and the liveness of the subject in the communication process.Security analysis results demonstrate that the system has better security properties and can partly resist message replay attack and man in the middle attack.(3)Design and implementation of similar data deduplication system in cloud environment.According to the composition and business function requirements of the deduplication system in this thesis,a similar data deduplication system is designed and implemented in cloud environment.The system mainly includes six function modules.The first two are the user registration module and the user login module,which implement the function of user authorized access;then,the file upload and similarity check module,which implements the function of file repeatability judgment and file upload;the next is the file directory viewing module,the function of retrieving the ciphertext of files uploaded by other users of the enterprise is implemented;the last two are the file sharing request module and the file sharing permission module,which implement the function of security sharing about file symmetric key.Aiming at the low deduplication rate of file level and block level deduplication technology,this thesis proposes an image file feature extraction algorithm based on data similarity,designs a data security sharing scheme,and implements a similar data deduplication system in cloud environment.On the basis of proven security,this system can reduce the communication overhead of client deduplication and improve the utilization rate of storage resources on cloud server.In addition,the deployment of similar data deduplication system in the cloud environment is carried out in the enterprise.The system meets the user's demand for similar data deduplication through the verification of practical application.
Keywords/Search Tags:Cloud Storage, Similarity, Perceptual Hashing, Fuzzy Hash, Security Deduplication
PDF Full Text Request
Related items