Font Size: a A A

Data Compression Research Based On The Multiple Pattern Matching Algorithm

Posted on:2012-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:X WeiFull Text:PDF
GTID:2178330335974549Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, data compression becomes more and more important, because larger number of data has to be handled and transferred in our daily life. Text data is an important component of the processed data, so the study of text compression has become a focus in data compression field. A compression algorithm based on dictionary is a typical text compression algorithm, so a study of it has a great significance for text data compression. Based on the analysis of the compression algorithm based on dictionary, this paper choose LZSS algorithm to expand our study. This paper is carried as follows:First, in order to improve the compression ratio of files, this paper studied lossless compression algorithms of text files, reviewed the classical lossless compression algorithms and elaborated the principles and characteristics of general compression algorithms. This paper implemented three classic compression algorithms based on dictionary, and did analysis on the basis of experiments. LZSS algorithm is the improvement of LZ77 algorithm, although the compression ratio is not very high, which has been widely recognized in practice because of simplicity and high decompression efficiency. Therefore, LZSS algorithms have being chosen to study, aiming at further improving the compression ratio on the basis of maintaining high decompression speed.Second, based on LZSS, this paper improved string matching process, and proposed a new algorithm--WM_LZSS algorithm by further use of popular multi-pattern matching algorithm--Wu-Manber algorithm.The basic idea of the WM_LZSS algorithm is to use latest correlation of text, multi-level matching, Hash and jump checking to solve the looking back problem existing in the compression of LZSS algorithm, and to lookup in a wider context by means of multi-pattern matching techniques. Matching among multiple patterns at one time, which help to find longer matching information. avoid the unnecessary matching and accelerate the process of matching. As a result, the algorithm gained the higher compression ratio. This paper introduces the key process of compression algorithm based on multiple matching. It is to build shift table and hash table using one matching result, and get pattern library. And then, read fixed size data blocks from the file, do multi-pattern matching preprocessing, and search all the patterns in the pattern library to obtain the matching data. At last, output coding using the data obtained by matching and improves the tree structure.Finally, the paper selected a general text compression file as a test data. The compression ratio was fully tested, from the file type, file size, the size of the smallest model and other aspects. Besides, related compression algorithms are compared. Experiments show that compression ratio of the improved algorithm has been improved obviously. The decompression algorithm is also of fast and simple characteristics, particularly suitable in a case of extracting several times after compression once.
Keywords/Search Tags:lossless data compression, LZSS compression algorithm, patten library, WM multi-pattern matching, preprocessing
PDF Full Text Request
Related items