Font Size: a A A

Improved Pattern Matching Algorithms For LZ Compressed Texts

Posted on:2018-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:T X ManFull Text:PDF
GTID:2348330515473973Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the big data era coming,people solve all kinds of problems with the massive data which contains huge number of information.The reason is that the amount of data is very huge.However,in this process we must encounter many difficulties.Such as data storage,data query,data extraction,etc.This paper is to study and discuss the fundamental problems of data processing.How to process pattern matching on some compressed text files directly.This paper focuses on the most popular lossless compression scheme LZ78 algorithm and LZW algorithm.Because both of them have similar features in the compressed form,it is convenient for some pattern matching algorithms.Therefore,this article will improve some search algorithms with these characteristics,so that they can be directly used on the compressed text search.In this paper,we improve BM and some more simple and effective algorithm which come from BM such as Sunday algorithm and Horspool algorithm.These algorithms are different from the traditional word by word matching method.In the matching processing they move the matching window as far as possible so as to speed up the matching speed and improve the performance of the algorithm.We compared these three improved algorithm with the traditional decompression-search algorithm and obtained the great increase in the speed of the implementation.At the same time as the direct search algorithm does not need to decompress files,we saved a lot of storage space so that it is convenient for storage and transmission.Comparing these three improved search algorithms,they also get their own characteristics.Compared with the improved BM algorithm,the matching process of BM algorithm is more complex and the performance is slightly lower.In the Sunday algorithm we select the symbol which is the first one outside the window,so because of the LZ series of compressed text storage characteristics,Sunday algorithm has the requirements of the matching sequence,therefore,its matching performance can't achieve the best.Relatively speaking,the improved Horspool algorithm is more simple and effective.It can also choose a different sequence of matching for different text and pattern string,so the performance is more efficient.
Keywords/Search Tags:LZ78, LZW, BM algorithm, Horspool algorithm, Sunday algorithm
PDF Full Text Request
Related items