Research On Pre-processing And Character Extraction Of Form Document Recognition

Posted on:2006-04-06

Degree:Master

Type:Thesis

Country:China

Candidate:L Xie

Full Text:PDF

GTID:2168360155474023

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

OCR has received considerable attention in recent decades. And form document recognition has been an important field of OCR research and application. This dissertation mainly focuses on two issues of form document recognition: pre-processing and character extraction. Binarization appears to be an important step in image processing and character recognition. To overcome the draw-backs of current methods in the literature â€“noise-sensitive, low speed, and arousing broken strokes etc., a hybrid method based on non-linear contrast enhancing and LoG operator is presented to binarize form document images. Experiments show that our approach has significant advantages. Skews are inevitably introduced during the course of scanning. A new approach based on Hough transform â€“Hough transform with parameter constraints, is implemented in the scope of form document images. In combination with a fast affine transform, this approach appears to dramatically speed up skew correction. One of our critical tasks is to deal with the situation in which handwritten characters are overlapping borders. A novel approach is presented to solve this problem: Form cells are accurately located through a CTF process. And characters overlapping borders are perfectly extracted by utilizing an ECCEA method. De-noising and smoothing are carried out as a last stage. Experiments prove the effectiveness of our approach. Finally, with the help of the methods and algorithms proposed in this dissertation, and with the exploitation of our latest research in areas like character segmentation, feature extraction and classification, an automatic transcript recognition system is developed, with an average recognition rate of 90.89%.

Keywords/Search Tags:

OCR, form document recognition, binarization, skew correction, overlapping borders, character extraction

PDF Full Text Request

Related items

1	Study On Preprocessing And Text Extraction Algorithms For Complex Form Documents
2	Complex Layout Analysis And Digital Recogntion In Medical Record
3	Chinese Forum Punctuation Extraction And Recognition,
4	Research On The Algorithm And Realization Of Bill Character Recognition System
5	The Research On Layout Analysis Methods Of Form Image
6	Research On Preprocessing Of Optical Character Recognition For Mobile Devices
7	Adaptive Binarization And Character Recognition For Document Image
8	Research On Form And Chinese Characters Recognition In Printed Chinese Document Recognition System
9	Research On Form Document Image Analysis
10	Research On Form Recognition In Printed Document Recognition System