Font Size: a A A

An Efficient Data Deduplication Design with Flash-Memory Based Solid State Drive

Posted on:2013-12-31Degree:Ph.DType:Dissertation
University:University of MinnesotaCandidate:Lu, GuanlinFull Text:PDF
GTID:1458390008484297Subject:Computer Science
Abstract/Summary:
Today, a predominant portion of Internet services (e.g., content delivery networks, online backup storage, news broadcasting, blog sharing and social networks) are data centric. A significant amount of new data is generated by these services every day and a large portion of this created data is redundant. Data deduplication is a prevailing technique used to identify and eliminate redundant data, so as to reduce the space requirement for both primary file systems and data backups.;The variety of objectives in a deduplication system design is the primary interest of this dissertation. These objects include maximizing the redundant data removed and achieving a high deduplication read/write throughput with a minimum RAM overhead per chunk. To achieve the first objective, this dissertation proposes a novel chunking algorithm that breaks the input dataset into chunks, with a higher redundancy or with larger sizes, so as to identify the more duplicated data without producing larger numbers of chunks, as compared to other chunking algorithms. To achieve high deduplication throughput while minimizing RAM overhead per chunk, this dissertation proposes a RAM frugal chunk index design along with a chunk filter that is used to filter out index lookups on nonexistent chunks. Both index and filter designs efficiently use a very limited RAM space with ash-memory as persistent storage. In particular, the proposed chunk filter design can dynamically scale up to adapt to the growth of the dataset. In addition, the proposed chunk index design could achieve high throughput, low latency chunk lookup/insert operations with extremely low RAM overhead at the sub-byte-per-chunk level.
Keywords/Search Tags:RAM overhead, Data, Deduplication, Chunk, Index
Related items