Research On Lossless Compression Approach Based On Local Similarity

Posted on:2018-10-01

Degree:Master

Type:Thesis

Country:China

Candidate:Z J Han

Full Text:PDF

GTID:2428330569485419

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of electronic information technology in recent years,the world has stepped into the era of big data.Data compression,a mainstream technology of data reduction,is widely used in storage systems to reduce redundant data and thus save storage space.Traditional lossless compression approaches mainly detect redundant data in a sliding window.This leads to suboptimal compression performance because the sliding window size limits the range of detecting,so redundancy among the different compression windows cannot be eliminated.Different from traditional compression approaches,data deduplication reduces redundant data in chunk-level or file-level.Duplicate chunks can be detected by matching fingerprint,which avoids the complicated bytes-matching in the traditional compression approaches.However,data deduplication can only identify and remove completely duplicate data files and chunks while fails to detect redundancy among non-duplicate but very similar files and chunks.A lot of studies suggest that combining traditional compression and data deduplication can promote the compression performance.Since duplicate data is reduced after deduplication,similar data can be detected by resemblance detection and is eliminated through compressors.The existing resemblance detection computes a lot,resulting in a low compression throughput.To solve this problem,an efficient compression approach based on local similarity is proposed in this paper.It effectively combines deduplication and traditional compressors to increase compression ratio and efficiency.Specifically,the compression approach based on local similarity makes full use of deduplication to(1)accelerate data reduction by fast but global deduplication and(2)exploit data locality to compress similar chunks by clustering the data chunks which are adjacent to the same duplicate chunks.It simplifies the process of resemblance detection and promotes the compression throughput.The experimental results based on real-world datasets show that the compression approach which is based on local similarity increases the compression ratio by 20%~71% and speeds up the compression throughput by 17% to 183%,which effectively promote the compression traditional compressors.

Keywords/Search Tags:

Traditional Lossless Compression, Data Deduplication, Data Locality

PDF Full Text Request

Related items

1	Research On Data Deduplication Based On File Access Patterns
2	Research On Duplicate Data Detection In Data Deduplication
3	Lossless Compression Technology In The Posture Data Acquisition System
4	Research On High Performance Redundancy Elimination Techniques For Data Backup Systems
5	The Research Of Real-time Lossless Data Compression Technology Based On DSP
6	Lossless Data Compression,Algorithms Comparisons And Implementation.
7	The Research On A Lossless Data Compression Algorithm
8	Research On Key Technology In Mass Data Processing Based On Inline Deduplication
9	Research Of Data Deduplication In Data Disaster Tolerance Systems
10	Design And Implementation Of On-Board Image Compression System With Interframe Lossless And Near-Lossless Method