Font Size: a A A

Font classification and character segmentation for postal address reading

Posted on:2002-03-08Degree:Ph.DType:Dissertation
University:State University of New York at BuffaloCandidate:Jung, MinchulFull Text:PDF
GTID:1468390011492315Subject:Computer Science
Abstract/Summary:
This dissertation introduces both a new font classification approach and a new character segmentation algorithm in order to improve the performance of return address recognition in US mail pieces.; The proposed font classification identifies font style, font group, and font name with a word input in a return address. The proposed a priori and local approach to the font classification allows an OCR system consisting of various font-specific character segmentation tools and various mono-font character recognizers; The proposed font classification uses ascenders, descenders, and serifs extracted from a word image. The gradient features of those sub-images are extracted and used as an input to a neural network classifier to produce font classification results. The font classifier presented in this research can identify a font even with one word that has severely touching characters.; The proposed character segmentation is a font-specific approach that uses side profiles according to font groups. The merged parts of touching characters generate different shapes of patterns from the primitive character patterns. However, the leftmost side and the rightmost side of touching characters will not be affected by the touching.; The analysis of those side profiles gives the candidate single characters for touching characters, since a side profile of each character is unique. The cutting cost and the tangent cost are defined to find an optimal segmenting path.; The results have shown that the font classification accuracy reaches about 95.4% performance level even with severely touching characters in 7 PostScript fonts such as Avant Garde, Bookman, Courier, Helvetica, New Century Schoolbook, Palatino, and Times.; The performance of the character segmentation has been obtained using a real envelope reader system, which can recognize return addresses in US mail pieces and sort the mail pieces according to the senders. 3359 mail pieces were tested. The improvement was from 68.92% to 80.08% by the proposed character segmentation.
Keywords/Search Tags:Character segmentation, Font classification, Mail pieces, Address
Related items