Font Size: a A A

Document triage using handwriting and machine print segmentation

Posted on:2010-03-11Degree:Ph.DType:Dissertation
University:Arizona State UniversityCandidate:Femiani, John CFull Text:PDF
GTID:1448390002477881Subject:Computer Science
Abstract/Summary:
This dissertation proposes techniques to triage document images based on the mode or type of each mark in the image so that specialized downstream processes can be selected based on the type of content in each image. The following contributions are presented: (1) A new robust representation of colors called IHSV for representing distributions of quantized colors from an RGB color space as intervals in Hue, Saturation, and Value. The color representation is used to extract handwriting from images of aerial photographs. (2) A novel stroke graph representation of ribbon like marks that captures properties such as intensity, direction, and thickness along the trajectory of handwriting. The stroke graph is used as a basis for handwriting and machine print segmentation. The method is shown to have an equal error rate accuracy of 96.8 percent for identifying handwriting on a large corpus of mixed mode Arabic language documents. (3) A new scale invariant shape signature to compare the contours of words or regions extracted from aerial photos. The new method is capable of finding one open curve embedded as part of a longer curve in big-O of N times the log of N asymptotic running time, and part-to-part curve matching is improved by an order of magnitude from big-O of N to the fourth to big-O of N to the third power.
Keywords/Search Tags:Handwriting
Related items