Font Size: a A A

Research On Entropy Coding Algorithms

Posted on:2022-08-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:N WangFull Text:PDF
GTID:1488306323482434Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
With the advent of the information age,data compression has become an indispens-able part in data storage and transmission due to the large amount of data and limited computer storage and network bandwidth.Data compression is a technology to reduce the storage capacity of data and improve the utilization of computer resources on the basis of ensuring important information.Entropy coding is an effective method of data compression.Most compression systems for image,speech and video use adaptive pre-dictors or decorrelation transforms to map blocks of the original data into low-entropy blocks of integers to facilitate entropy coding.Shannon coding,Huffman coding,Lempel-Ziv coding,Golomb coding and ANS coding are commonly used in entropy coding.Golomb coding is a lossless data com-pression method,when the symbol probability in the input sequence obeys the geometric distribution,the optimal prefix code can be obtained by gololmb coding.Given a posi-tive integer N to be encoded,Golomb coding uses a parameter M to divide it into two parts:quotient part and remainder part.When M is not a power of two,the codewords of these two parts are both coded with variable-length coding.Compared with fixed-length coding,variable-length coding has higher encoding and decoding complexity and slower implementation speed.In addition,Golomb coding can only use binary encoding,and there is no literature exploring arbitrary n-ary Golomb coding.ANS is a new entropy coding method.It has two main implementations:range ANS(rANS)and tabled ANS(tANS).Among them,rANS requires some complex arith-metic operations,including integer multiplication and integer division,and the compu-tational complexity is relatively high.By pinning the entire behavior(including renor-malization)into a lookup table to avoid multiplication,tANS achieves the compression performance of arithmetic coding at the same coding speed as Huffman coding.How-ever,tANS needs a lot of space to store the whole look-up table,which requires high memory demand on CPU.Golomb coding and ANS are both variable-length coding,the main limitation of variable-length coding is the lack of effective random access.In the field of database,we often encounter the need to access or modify an entry randomly in the form of compres-sion.The current domestic and foreign research in this area can be roughly divided into two categories:(1)random access mechanism based on sampling;(2)random access mechanism based on auxiliary data structure.However,the existing solutions require additional space,the use of extra space will lose part of the compression performance.This paper mainly studies the entropy coding technologies in text compression,including Golomb coding,ANS coding and random access in variable-length coding.First,this paper points out the drawbacks of traditional Golomb coding,and then pro-poses a new variant of Golomb coding that always encodes the remainder of the code-word with a fixed-length coding for any M.In addition,this paper extends the binary Golomb coding and the proposed coding to arbitrary n-ary.Then,a tANS algorithm with a smaller lookup table space is proposed,which avoids the high complexity of multiplication operation and solves the problem of large memory occupation.Finally,without loss of compression performance,this paper proposes a new solution that does not require auxiliary space and supports random access,which significantly improves the random access efficiency of the algorithm.The contributions of this paper are as follows:1.In this paper,a variant of Golomb coding is proposed.Without loss of com-pression performance,it always encodes the remainder of the codeword with a fixed-length coding for any M,which significantly reduces the complexity of encoding and decoding.In addition,n-ary Golomb coding and n-ary proposed coding are proposed.The experimental results show that,the proposed coding scheme involves approximately 20%fewer addition operations,40%fewer mul-tiplication operations during encoding and 40%fewer addition operations,10%fewer multiplication operations,50%fewer bitwise operations during decoding than Golomb coding.2.In this paper,a simplified tANS algorithm with smaller lookup table space is proposed,which can avoid multiplication and reduce the space occupied by the lookup table.In addition,a decoding algorithm which can decode multiple sym-bols at once is proposed.The experimental results show that,with a slight loss of compression ratio(approximately 0.5%lower),the proposed method has up to a 25%(60%)better throughput than rANS during encoding(decoding).3.This paper proposes a method for rearranging the prefix code stream,which can support efficient random access without using additional space.Experimental results show that compared with the traditional method,the proposed scheme reduces the number of bits that need to be read when randomly accessing an el-ement by approximately two orders of magnitude.In addition,for prefix codes whose codeword length can be determined by its prefix,such as Canonical Huff-man coding,the proposed method can further improve the efficiency of random access.
Keywords/Search Tags:Entropy coding, Golomb coding, ANS coding, Random access
PDF Full Text Request
Related items