Font Size: a A A

Document image binarization based on texture analysis

Posted on:1996-08-10Degree:Ph.DType:Dissertation
University:State University of New York at BuffaloCandidate:Liu, YingFull Text:PDF
GTID:1468390014986965Subject:Computer Science
Abstract/Summary:
Document image binarization has been a long standing problem for unconstrained document images. Although various thresholding algorithms have been developed over the years, problems associated with strong noise, complex patterns, poor contrast, and variable modalities in gray-scale histograms still limit the performance of document image analysis systems. Given the unpredictable nature of these image attributes, few thresholding algorithms work consistently well for document image binarization. This dissertation presents texture feature based thresholding algorithms to address these difficulties.; The philosophy of our thresholding approach is that the texture domain knowledge of document images is important to judge the binarization quality and thus guide the binarization process; that is, suitably defined texture features of document images can be used to assist the optimal threshold selection.; Our thresholding scheme consists of three steps. First, candidate thresholds are produced through the iterative use of Otsu's algorithm. Second, texture features associated with each candidate threshold are extracted from the run-length histogram of the accordingly binarized image. Third, the optimal threshold is selected so that the most desirable document texture features are preserved. This thresholding scheme was implemented in both global and adaptive modes. With our program design the algorithms require only one image scan pass, facilitating their hardware implementation for a commercial system.; Experimental results with 9000 machine printed address blocks from an unconstrained US mail stream demonstrated that over 99.6% of the images were well binarized by our thresholding method, which are appreciably better than those obtained by existing thresholding techniques. Also a system run with 500 difficult mail address blocks showed that an 8.1% higher character recognition rate was achieved by our algorithm in comparison to that by Otsu's algorithm.
Keywords/Search Tags:Document image, Texture, Thresholding
Related items