Font Size: a A A

Research On Lossless Compression Approach Based On Local Similarity

Posted on:2018-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z J HanFull Text:PDF
GTID:2428330569485419Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of electronic information technology in recent years,the world has stepped into the era of big data.Data compression,a mainstream technology of data reduction,is widely used in storage systems to reduce redundant data and thus save storage space.Traditional lossless compression approaches mainly detect redundant data in a sliding window.This leads to suboptimal compression performance because the sliding window size limits the range of detecting,so redundancy among the different compression windows cannot be eliminated.Different from traditional compression approaches,data deduplication reduces redundant data in chunk-level or file-level.Duplicate chunks can be detected by matching fingerprint,which avoids the complicated bytes-matching in the traditional compression approaches.However,data deduplication can only identify and remove completely duplicate data files and chunks while fails to detect redundancy among non-duplicate but very similar files and chunks.A lot of studies suggest that combining traditional compression and data deduplication can promote the compression performance.Since duplicate data is reduced after deduplication,similar data can be detected by resemblance detection and is eliminated through compressors.The existing resemblance detection computes a lot,resulting in a low compression throughput.To solve this problem,an efficient compression approach based on local similarity is proposed in this paper.It effectively combines deduplication and traditional compressors to increase compression ratio and efficiency.Specifically,the compression approach based on local similarity makes full use of deduplication to(1)accelerate data reduction by fast but global deduplication and(2)exploit data locality to compress similar chunks by clustering the data chunks which are adjacent to the same duplicate chunks.It simplifies the process of resemblance detection and promotes the compression throughput.The experimental results based on real-world datasets show that the compression approach which is based on local similarity increases the compression ratio by 20%~71% and speeds up the compression throughput by 17% to 183%,which effectively promote the compression traditional compressors.
Keywords/Search Tags:Traditional Lossless Compression, Data Deduplication, Data Locality
PDF Full Text Request
Related items