Font Size: a A A

Compressed pattern matching for text and images

Posted on:2006-01-31Degree:Ph.DType:Dissertation
University:University of Central FloridaCandidate:Tao, TaoFull Text:PDF
GTID:1458390008969811Subject:Computer Science
Abstract/Summary:
In this dissertation, we have studied compressed pattern matching problem for both text and images. We present a series of novel compressed pattern matching algorithms, which are divided into two major parts. The first major work is done for the popular LZW compression algorithm. The second major work is done for the current lossless image compression standard JPEG-LS. Specifically, our contributions from the first major work are: (1) We have developed an "almost-optimal" compressed pattern matching algorithm that reports all pattern occurrences. An earlier "almost-optimal" algorithm reported in the literature is only capable of detecting the first occurrence of the pattern and the practical performance of the algorithm is not clear. We have implemented our algorithm and provide extensive experimental results measuring the speed of our algorithm. We also developed a faster implementation for so-called "simple patterns". The simple patterns are patterns that no unique symbol appears more than once. (2) We have developed a novel compressed pattern matching algorithm for multiple patterns using the Aho-Corasick algorithm. The algorithm takes O(mt + n + r) time with O(mt) extra space, where n is the size of the compressed file, m is the total size of all patterns, t is the size of the LZW trie and r is the number of occurrences of the patterns.; All the above algorithms have been implemented and extensive experiments have been conducted to test the performance of our algorithms and to compare with the best existing algorithms. The experimental results show that our compressed pattern matching algorithm for multiple patterns is competitive among the best algorithms and is practically the fastest among all approaches when the number of patterns is not very large. Therefore, our algorithm is preferable for general string matching applications. LZW is one of the most efficient and popular compression algorithms used extensively and both of our algorithms require no modification on the compression algorithm. Our work, therefore, has great economical and market potential.; Our contributions from the second major work are: (1) We have developed a new global context variation of the JPEG-LS compression algorithm and the corresponding compressed pattern matching algorithm. Comparing to the original JPEG-LS, the global context variation is search-aware and has faster encoding and decoding speeds. (2) We have developed a two-pass variation of the JPEG-LS algorithm and the corresponding compressed pattern matching algorithm. The two-pass variation achieves search-awareness through a common compression technique called semi static dictionary. (Abstract shortened by UMI.)...
Keywords/Search Tags:Compressed pattern matching, Algorithm, Compression, Major work, Variation, JPEG-LS
Related items