Font Size: a A A

PATTERN MATCHING WITH APPLICATION TO BINARY IMAGE COMPRESSION

Posted on:1984-11-05Degree:Ph.DType:Thesis
University:Stanford UniversityCandidate:MOHIUDDIN, KOTTAPPURAM MOHAMMED ALIFull Text:PDF
GTID:2478390017962739Subject:Engineering
Abstract/Summary:
Image compression based on pattern matching is a complex technique for achieving large amounts of compression. It is particularly suited for scanned document images. A library of prototype patterns is dynamically created for each scanned page and input patterns are encoded with respect to matching prototypes. In this thesis, we study various pattern matching and encoding techniques and analyze the complexity of computation and the storage requirements.;In critical applications, like banking, no errors can be tolerated. Lossless encoding is proposed as a solution to this problem. A powerful model, based on a two dimensional context consisting of the neighborhood pels from the reference and input patterns, is developed; this provides for a virtual non-causal context. Lossless encoding based on this scheme gives 10% to 20% better compression compared to best published results. We show that the optimum library for lossless compression should consist of those patterns that a human observer would recognize as being distinct.;The complexity of computation is shown to be 0(nm), where 'n' is the number of input patterns and 'm' is the number of prototype patterns in the library. A template matching array is proposed to reduce the processing time from a few minutes to only a fraction of a second. When the input contains more patterns than what the library can accommodate, a strategy is required to replace some of the existing prototype patterns. An optimum replacement policy is formulated; no closed form solution has been found for it. However, because of the strong 'locality' of reference exhibited by document patterns, it is possible to use simple and stable algorithms and still keep the replacement overhead to a minimum.;Conditional information is proposed as the ideal pattern matching measure. In the face of insufficient statistics this criterion leads to incorrect classifications. We derive a generalized Hamming distance by making a modification to the above measure. A simple and effective threshold criterion is developed. The classification accuracy on the test documents is found to be better than 99%. The compression figures obtained are much better than previously published results.
Keywords/Search Tags:Compression, Pattern matching
Related items