Font Size: a A A

Efficient and secure deduplication for cloud-based backups

Posted on:2016-06-24Degree:M.SType:Thesis
University:Temple UniversityCandidate:Wang, YufengFull Text:PDF
GTID:2478390017484007Subject:Computer Science
Abstract/Summary:
Backup storage based on cloud service is becoming increasingly popular. Deduplication is a key technique that reduces the transmission and storage overhead of backing up large datasets by identifying multiple copies of redundant data.;Elasticity is the ability to scale computing resources such as memory on-demand, and is one of the main advantages of utilizing cloud computing services. With the increasing popularity of cloud based storage, it is natural that more deduplication based storage systems will be migrated to the cloud. Existing deduplication systems however, do not adequately take advantage of elasticity.;In this thesis, we illustrate how to use elasticity to improve deduplication based systems, and propose EAD (elasticity aware deduplication), an indexing algorithm that uses the ability to dynamically increase memory resources to improve overall deduplication performance. Our experimental results indicate that EAD is able to detect more than 98% of all duplicate data, however only consumes less than 5% of expected memory space. Meanwhile, it claims four times of deduplication efficiency than the state-of-art sampling technique while costs less than half of the amount of memory.;Furthermore, as the data growing rapidly in data centers, single-node storage node is no longer be able to provide the corresponding throughput and capacities as expected. Building deduplication clusters is considered as a promising strategy to leverage such bottle-neck on single-node system. However, deduplication relies on how much the system knows about information of previous stored data. The single-node system obviously obtains all such information and is able to detect duplicate data there; however storage nodes in cluster-based system cannot know information on other nodes. It is nontrivial to route data intelligently enough so that the system could support deduplication performance comparable to that of a single-node system, while also at a trivial cost. Thus, we propose an elastic data routing strategy, aiming to achieve deduplication performance comparable to state-of-the-art, while require much less computation resources.;To step further, deduplication as it is currently adopted by cloud backup providers is vulnerable to side-channel attacks. Traditional defenses in cloud computing can prevent such attacks, but are cannot be use together with deduplication. Therefore, we explore the impact of encryption on data uploads to the cloud as well as proposing a solution for cloud-based backup services that combines deduplication and encryption to provide both security and high bandwidth and efficiency. Extensive experiments on real world dataset shows that our solution incurs a small overhead compared to native deduplication while offering strong security protections.
Keywords/Search Tags:Deduplication, Cloud, Data, Storage
Related items