Font Size: a A A

A Rolling Hash Algorithm Based On Carry-less Multiplication And The Implementation To LZ4 Data Compression

Posted on:2022-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:H JiangFull Text:PDF
GTID:2518306323978349Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
At present,online compression algorithms in storage systems are the main bottleneck that affects end-to-end system performance,and end-to-end CPUs account for a high proportion,which has become one of the pain points in the industry.Among them,the LZ4 compression algorithm is widely used due to its good compression rate,and extremely fast compression and decompression speeds.How to further improve the compression speed of LZ4 without affecting the compression rate and decompression speed is the focus of this thesis.LZ4,as a variant of LZ77,is a dictionary compression algorithm.It scans from the beginning of the file,and then replaces the repeated strings in the dictionary with an index to achieve compression.In order to find these duplications,the LZ4 compressor maintains a hash table,which frequently calculates the hash value during the encoding process.Therefore,the calculation efficiency of the hash function will greatly affect the speed of the compressor.In addition,when a hash collision occurs,LZ4 will directly replace the calculated hash value with the index of the historical string stored in the original table.Thus,the randomness of the hash function will affect the compression rate.In this thesis,we propose a rolling hash function based on carry-less multiplication,and use its algebraic properties to further propose a batch processing scheme for calculating multiple hash values at once.We have conducted theoretical and experimental analysis on the proposed hash function.We prove that the proposed hash is a universal hash,and provide an extended scheme of strong universal hash with stronger theoretical guarantee.In addition,we compared the proposed hash function with the hash function used in LZ4 in terms of speed and randomness,showing the feasibility of the scheme.Finally,we tested and analyzed the above algorithms through simulation experiments.The simulation shows that the encoding throughput of LZ4 has 15.7%improvement in average,and the compress ion ratio isą1%in most cases.
Keywords/Search Tags:Carry-less multiplication, Rolling hash, Universal hashing, LZ4, LZ77, SIMD
PDF Full Text Request
Related items