Font Size: a A A

Research On Key Technologies In Metadata Management Of Data Deduplication For Massive Data

Posted on:2016-06-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:B ZhouFull Text:PDF
GTID:1108330503456506Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the explosive growth in the amount of global data, the data deduplication technology has been widely used in storage and network transmission systems. As the technical issues involved in data deduplication are extensive, several metadata management related key technologies are covered in this dissertation, including metadata harnessing, caching, communication and deduplication with adaptive block skipping,in order to reduce the deduplication related time and space overhead and achieve a better trade-o ff between duplication elimination e ffi ciency and deduplication related metadata as well as deduplication throughput, so as to meet the requirements of rapid data growth and high performance computation. The contributions of this dissertation include:? A hysteresis hash rechunking based metadata harnessing algorithm. On the one hand, with data locality preserved, multiple consecutive small granularity hash indices are selected to be merged into a large granularity hash index for hash index reduction within the deduplication metadata. On the other hand, during data deduplication, a hysteresis hash re-chunking algorithm is performed on the large granularity hash indices according to the edges of duplicate data slices detected to guarantee a high indexing e ffi ciency of metadata.? A metadata write caching algorithm combined with the metadata harnessing algorithm. The newly produced metadata are e ffi ciently cached in the metadata write cache, so as to improve the RAM hit ratio of deduplication metadata and further relieve the deduplication metadata related disk I/O bottleneck, to improve the deduplication throughput.? A metadata feedback algorithm combined with the metadata harnessing scheme for data deduplication across wide area network. Based on data locality information, chosen metadata are piggy-backed from the receiver to the sender for subsequent data deduplication at the sender to reduce the number of duplication query/answer operations and the related time overhead, so as to improve the deduplication throughput across wide area network.? A data deduplication framework with adaptive block skipping. The data chunks heuristically determined as non-duplicates do not invoke the data deduplication processes and the deduplication overheads for the skipped data chunks are saved,which would help to improve the trade-o ff between duplication elimination e ffi-ciency and deduplication related metadata as well as deduplication throughput.
Keywords/Search Tags:Storage, Transmission, Massive Data, Data Deduplication, Metadata, High Performance
PDF Full Text Request
Related items