Font Size: a A A

Research On Security Technologies Of Data Deduplication For Cloud Storage Systems

Posted on:2019-02-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y K ZhouFull Text:PDF
GTID:1368330590450408Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the exponential growth of the global data volume,the storage and protection of large-scale data have become the important research challenges.Cloud storage systems provide high-capacity,scalable and pay-as-you-go services,and more users and enterprises send data to the cloud.According to the data analysis,large amounts of redundant data exist in storage systems.Storage systems use data deduplication to reduce redundant data,which can save storage cost and network bandwidth.However,data deduplication in the cloud storage system suffers from security threats and performance issues of their security methods,such as data confidentiality and computational overheads of encryption methods of deduplication,privacy preservation and computational consumption during the process of file uploading and accessing,and the availability and storage overheads of shared data etc.How to build a secure and efficient deduplication-based cloud storage system is an urgent problem to be solved.Therefore,we study the efficient security methods of data deduplication on the confidentiality,privacy,and availability in cloud storage systems.To solve the problems of brute-force attacks or large time overheads of encryption methods for cross-user deduplication,we propose SecDep,a user-aware convergent encryption(CE)method.SecDep uses a server-aided encryption algorithm for inter-user file-level deduplication and combines a user-aided encryption algorithm for intra-user chunk-level deduplication to generate random and secure keys,which can resist brute-force attacks and reduce time overheads.Inter-user deduplication faces higher risks.Existing methods employ a server-aided algorithm on chunks to improve data security,which incur large time overheads due to a large number of exponential operations.Therefore,SecDep utilizes an encryption algorithm with high security for inter-user deduplication,while adopts a low-cost encryption algorithm for intra-user deduplication.SecDep combines inter-user file-level and intra-user chunk-level deduplication to ensure deduplication factor.To protect keys,SecDep leverages a file-level key to encrypt chunk-level keys,and splits the file-level key into shares with Shamir Secret Sharing Scheme and sends them to distributed key servers.Security analysis demonstrates that SecDep resists brute-force attacks and ensures data confidentiality.Experimental results show that SecDep reduces 52%-92% of time overheads at the expense of losing 2.8%-7.35% of deduplication factor compared with DupLESS-chunk.During the process of file uploading in deduplication-based cloud storage systems,the client needs to perform a proof-of-ownership(PoW)protocol to convince the server that it indeed holds the file content to resist hash-as-a-proof attacks.To avoid privacy leakage due to false positive and support data updating,we present AS-PoW,a dual-level proof-ofownership method with efficient data updating.AS-PoW conducts a challenge-and-response protocol by combining the verification of a cuckoo filter and algebraic signatures of chunks to avoid privacy leakage.Meanwhile,AS-PoW searches modified data accurately and only uploads them to reduce the updating cost.To avoid privacy leakage,AS-PoW firstly conducts a cuckoo filter over the ciphertexts of chunks.The storage server selects multiple index of challenged chunks randomly,and checks the response from the client in the cuckoo filter.If the first level verification passes,the storage server further verifies the homomorphism of the algebraic signature of challenged chunks by computing the signature of the sum of challenged chunks to verify whether or not equal the sum of their signatures.BF-PoW uses a fix-sized chunking algorithm and does not support delete operation,which incurs large cost of data updating.AS-PoW utilizes a content-defined chunking algorithm to detect the updated data quickly.AS-PoW only needs to upload the modified chunks and updates the cuckoo filter,which reduces the updating cost.Security analysis demonstrates that AS-PoW resists hash-as-a-proof attacks and ensures data privacy.Experimental results show that AS-PoW reduces 64.5%-66.5% updating time overheads relative to BF-PoW.During the process of file accessing in deduplication-based cloud storage systems,the access control is used to protect the shared data of multiple users.To solve the problems of heavy computation consumption for small files and non-flexible user revocation,we present EDedup,a similarity-aware encrypted deduplication scheme that supports flexible access control with revocation.EDedup groups small files to reduce time cost by exploiting data similarity,and utilizes a proxy-based attribute-based encryption to encrypt file metadata to realize flexible user revocation and reduce the metadata storage overheads.Traditional method chooses the min-hash as the representative hash that results in privacy leakage.Thus EDedup performs source-based similar-segment detection to get the representative hash,and performs a server-aided encryption algorithm at segment-level to encrypt data,and then combines target-based duplicate-chunk checking to avoid privacy leakage.To reduce metadata overheads and support flexible user revocation,EDedup only keeps one copy of metadata for duplicated files and re-encrypts data by a proxy when users are revoked.Evaluation results demonstrate that EDedup improves 36% of the speed of encryption compared with SecDep,and reduces metadata storage overheads by 39.9%-65.7% relative to REED.To solve the problems of the poor scalability or large storage overheads of availabilityenhanced methods in deduplication-based cloud storage systems,we propose DARM,a deduplication-aware and low-overhead data redundancy method.DARM exploits the semantics of deduplication to mark the state of chunks,and performs a dynamic replication method to improve the scalability of redundancy methods and adopts an erasure coding method for other chunks,which can guarantee the data availability and reduce storage overheads.Analysis of deduplication semantics,for example,inter-/intra-file duplicates,chunk size and reference count etc.,we find that inter-file and highly-referenced chunks are important that need higher reliability assurance,but occupy a small fraction of physical storage.Therefore,DARM employs Selective and Dynamic Chunk-based Replication(SDCR)to increase the number of copies for maintaining inter-file and highly-referenced chunks to enhance storage availability.And DARM uses erasure coding for storing unique and lowreferenced chunks to improve both storage availability and space efficiency.Experimental results based on real-world datasets show that DARM reduces storage overheads by up to43.4% at the expense of only losing little data availability compared with Deep Store.
Keywords/Search Tags:Data Deduplication, Storage System, Convergent Encryption, Proof-of-Ownership, Access Control, Data Availability
PDF Full Text Request
Related items