Research On Key Technologies In Metadata Management Of Data Deduplication For Massive Data

Posted on:2016-06-11

Degree:Doctor

Type:Dissertation

Country:China

Candidate:B Zhou

Full Text:PDF

GTID:1108330503456506

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the explosive growth in the amount of global data, the data deduplication technology has been widely used in storage and network transmission systems. As the technical issues involved in data deduplication are extensive, several metadata management related key technologies are covered in this dissertation, including metadata harnessing, caching, communication and deduplication with adaptive block skipping,in order to reduce the deduplication related time and space overhead and achieve a better trade-o ff between duplication elimination e ffi ciency and deduplication related metadata as well as deduplication throughput, so as to meet the requirements of rapid data growth and high performance computation. The contributions of this dissertation include:? A hysteresis hash rechunking based metadata harnessing algorithm. On the one hand, with data locality preserved, multiple consecutive small granularity hash indices are selected to be merged into a large granularity hash index for hash index reduction within the deduplication metadata. On the other hand, during data deduplication, a hysteresis hash re-chunking algorithm is performed on the large granularity hash indices according to the edges of duplicate data slices detected to guarantee a high indexing e ffi ciency of metadata.? A metadata write caching algorithm combined with the metadata harnessing algorithm. The newly produced metadata are e ffi ciently cached in the metadata write cache, so as to improve the RAM hit ratio of deduplication metadata and further relieve the deduplication metadata related disk I/O bottleneck, to improve the deduplication throughput.? A metadata feedback algorithm combined with the metadata harnessing scheme for data deduplication across wide area network. Based on data locality information, chosen metadata are piggy-backed from the receiver to the sender for subsequent data deduplication at the sender to reduce the number of duplication query/answer operations and the related time overhead, so as to improve the deduplication throughput across wide area network.? A data deduplication framework with adaptive block skipping. The data chunks heuristically determined as non-duplicates do not invoke the data deduplication processes and the deduplication overheads for the skipped data chunks are saved,which would help to improve the trade-o ff between duplication elimination e ffi-ciency and deduplication related metadata as well as deduplication throughput.

Keywords/Search Tags:

Storage, Transmission, Massive Data, Data Deduplication, Metadata, High Performance

PDF Full Text Request

Related items

1	Research On High I/O Performance Data Deduplication In Primary Storage System
2	The Research On Key Technology In Deduplication On Cloud Storage
3	HTDRDedu:The Design And Implementation Of A Distributed Backup Data Deduplication System
4	Research On Application-aware Performance Optimizations For Deduplication-based Storage Systems
5	Research Of Data Deduplication In Data Disaster Tolerance Systems
6	The Optimization Research On Deduplication In Distributed Storage Environment
7	Research On Security Technologies Of Data Deduplication For Cloud Storage Systems
8	Research Of Efficient Metadata Management In Encrypted Deduplication
9	High Performance Data Deduplication Mechanisms For Data Centers
10	Design And Research On A High-performance Deduplication System