Font Size: a A A

Adaptive segmentation of document images

Posted on:2002-09-21Degree:Ph.DType:Dissertation
University:The University of Nebraska - LincolnCandidate:Sylwester, Donald RobertFull Text:PDF
GTID:1468390011990235Subject:Computer Science
Abstract/Summary:PDF Full Text Request
A method is presented for the efficient segmentation of text lines from scanned images of technical documents. The method has been implemented in the ARXYC (Adaptive Recursive XY Cut) algorithm, which constructs an XY-tree to represent the geometric layout structure of a page image in which the text lines are found as leaf nodes.; Geometric layout analysis is a subcomponent of the Document Image Analysis processing sequence and is typically preceded by scanning a document into a pixel map, preprocessing of the pixel map to reduce noise and remove skew and by thresholding to a binary image, and typically followed by a mapping of the geometric layout to a function representation and recovery of text and graphics from the pixel image.; Technical documents are sufficiently varied in structure to be challenging to segmentation algorithms yet sufficiently regular to be amenable to analysis. The vast store of archived technical documentation attests to the importance of the task.; ARXYC achieves high generality by depending on only a single primary parameter, the resolution-independent gap-ratio-threshold. ARXYC constructs an initial XY-tree in which the desired text lines are over-segmented into many fragments, then dynamically transforms the XY-tree to the target tree employing three elegant operators, cut, glue and flip, while adaptively applying the threshold to the merging of fragments into text lines. ARXYC monitors the dynamic changes in the structure of the XY-tree to avoid the most serious segmentation error, merging two fragments across the gap between columns.; Results are shown for three experiments on an image set of 97 document pages from a variety of technical journals. The first selects a single fixed threshold for a set of documents based on a sample from that set. The second selects a single fixed threshold for a specific image based on intrinsic measures of the onset of column bridging. Finally, ARXYC adaptively applies a varying threshold to each image guided by the dynamic behavior of the XY-tree matching, on average, 98.8% of the ground truth text lines.
Keywords/Search Tags:Image, Text lines, Document, Segmentation, ARXYC, Xy-tree, Technical
PDF Full Text Request
Related items