Font Size: a A A

An Adaptive Compression Algorithm Supporting Query

Posted on:2021-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:C J ZhaoFull Text:PDF
GTID:2428330623469208Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of computer technology,the amount of data is exploding,data compression as a technique to reduce the data storage space,can effectively ease the pressure of storage and transmission,has been widely used in various systems.Simultaneously,the mining of valid information on massive data is a current research hotspot,and data mining and analysis usually require operations based on random access to the data and data queries.However,existing traditional data compression algorithms are usually designed only to eliminate data redundancy and do not support random access,and other retrieval operations on compressed data and efficient index structures are also usually designed only for raw data and cannot be applied to the compressed data.Therefore,this thesis proposes an adaptive lossless compression framework supporting data queries: segmenting the data according to pre-determined delimiters,estimating the probability distribution of the data using the context model,compressing the data by arithmetic coding,and finally storing the encoding results in chunks.Among them,the adaptive,contextual model improves the compression by training on cold data and updating the model periodically to obtain an accurate probability distribution.In terms of storage strategy,this thesis chunked the encoded data based on the arithmetic encoding feature to support sequential queries without decompression,with query types including exact,prefix,and range queries.And,for more efficient querying,the algorithm in this thesis provides a convenient interface to support the integration of mainstream index structures,and the query effect verification is performed using a b+ tree as an example.The experimental results show that the compression effect of the algorithm in this thesis reaches the level of mainstream compression algorithms,and the sequential query speed is 24.8 times that of LZO,the fastest algorithm in comparison algorithms.At the same time,this thesis implements the integration of b + tree index.On the compressed data,the index query speed is 172.5 times that of the sequential query.
Keywords/Search Tags:lossless compression, arithmetic encoding, random access
PDF Full Text Request
Related items