Font Size: a A A

Research On Pre-processing And Character Extraction Of Form Document Recognition

Posted on:2006-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:L XieFull Text:PDF
GTID:2168360155474023Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
OCR has received considerable attention in recent decades. And form document recognition has been an important field of OCR research and application. This dissertation mainly focuses on two issues of form document recognition: pre-processing and character extraction. Binarization appears to be an important step in image processing and character recognition. To overcome the draw-backs of current methods in the literature –noise-sensitive, low speed, and arousing broken strokes etc., a hybrid method based on non-linear contrast enhancing and LoG operator is presented to binarize form document images. Experiments show that our approach has significant advantages. Skews are inevitably introduced during the course of scanning. A new approach based on Hough transform –Hough transform with parameter constraints, is implemented in the scope of form document images. In combination with a fast affine transform, this approach appears to dramatically speed up skew correction. One of our critical tasks is to deal with the situation in which handwritten characters are overlapping borders. A novel approach is presented to solve this problem: Form cells are accurately located through a CTF process. And characters overlapping borders are perfectly extracted by utilizing an ECCEA method. De-noising and smoothing are carried out as a last stage. Experiments prove the effectiveness of our approach. Finally, with the help of the methods and algorithms proposed in this dissertation, and with the exploitation of our latest research in areas like character segmentation, feature extraction and classification, an automatic transcript recognition system is developed, with an average recognition rate of 90.89%.
Keywords/Search Tags:OCR, form document recognition, binarization, skew correction, overlapping borders, character extraction
PDF Full Text Request
Related items