Font Size: a A A

Studies On Preprocess In Automatic Financial Document Processing System

Posted on:2005-05-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y ZhangFull Text:PDF
GTID:1118360125953579Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Processing financial document automatically is a very important topic, and has become one of the most promising commercial applications of handwriting recognition. The preprocess procedure is a critical part in the system since any error in this stage will propagate to all later analysis. Resent researches and applications show that many recognition errors are caused by poor quality images. And researches on how to improve the quality of character images before they are send to the recognition engine are very significance.In this dissertation, several key techniques in preprocess procedure are discussed.Characters usually overlap with the preprinted form frame lines, creating tremendous problems for the recognition engines. These lines are detected and removed directly based on gray level images. The two boundaries of a line are detected by the gray level Hough transform, and lines are removed by different overlapping types of characters and lines. Experiment results on real life check images demonstrate the efficient of our algorithm. The recognition rate is improved from 75.9% to 91.4%.Characters extracting represents an important challenge in the field of the automatic financial document processing system. Difficulties derive mainly from the different types and positions of the seal imprints, which are often dark and stroke-like. To solve this problem, a stroke's double edge detection method based on morphological method and two binary algorithms are proposed: (1) The new recursive thresholding algorithm is based on the rule which can evaluate the segment results by residual image analysis. It continues removing the brighter background from the image until only the darkest objects (characters) are left. (2) The second binary algorithm is based on the analysis of GRAY/MDE co-occurrence matrix. And GRAY describes pixel's own feature, MDE describes stroke's local feature. Experiments demonstrate the effectiveness of the two proposed methods as compared with five other common used binary methods in both subjective (by visual) and objective (by recognition) ways.Handwriting numeral segment is a hard but important task in an OCR system. A certain segment method for handwriting numeral strings in form frames is proposed. Usually, there are two connected types for numerals in form frames: transition-connected type (connect by a long horizon stroke) and share-connected type (connect by sharing one period stroke). For the former type, we firstly detect thecandidate segment positions based on local contour features, and then select a better position according to a 2-categories classifier. For the share-connected type, we segment by analysis the contour features.On the basis of the key techniques mentioned above, a practical handwriting-check processing system in supervise is briefly introduced. It has been applied in practice.
Keywords/Search Tags:financial document image processing, line detection, line removal, edge detection, image binary, handwriting numeral segment, character recognition
PDF Full Text Request
Related items