Font Size: a A A

A Study Of Data Compression Based On BWT And Wavelet Tree

Posted on:2015-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhaoFull Text:PDF
GTID:2308330464468848Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the fast development of internet and other new scientific fields, it produces more and more data which contains much useful and important information all over the world. People have started do some research and analysis on the large scale data to find their interested information. But it has been a big challenge to store the big data because of the scale of data. Also it takes a lots of time and network bandwidth to transmit data between computers on the internet. So the ways to compress data become more and more necessary and flourishing.Firstly, we analyze the methods to compute BWT based on suffix array and the property of BWT. Then we propose a modified algorithm to compute LF at the time of BWT inverse transform. Secondly, we do some research on the different rank/select data structures of bits sequence and we analyze the hierarchical effect on space size of various rank/select data structures of wavelet tree. At last, based on BWT and wavelet tree we propose an algorithm to compress and decompress data and some ideas to parallel the algorithm. By defining our own compressed file format, we efficiently develop data compression software called wzip.By doing some experiments, we test the influence of wavelet tree shape, inner nodes compression method and block size on wzip’s compression ratio. We discover that hu-taker shaped wavelet tree with run-length γ to compression its inner nodes will get the best compression ratio under the condition of fixed block size of BWT transform. By comparing wzip with bzip2, gzip, rar, zip, 7z and compress in the aspects of compression ratio, compression time and decompression time, we find that wzip could get a good compression ratio with fast compression speed. The source code is available online(https://github.com/goldbeef/wzip).
Keywords/Search Tags:data compression, BWT transform, wavelet tree
PDF Full Text Request
Related items