Font Size: a A A

A hybrid two-dimensional HMM and MLP OCR system for processing multi-font and low-quality English documents

Posted on:2005-07-27Degree:M.Comp.ScType:Thesis
University:Concordia University (Canada)Candidate:Fu, NenghongFull Text:PDF
GTID:2458390008997776Subject:Computer Science
Abstract/Summary:
This thesis presents a Hybrid 2-Direction(D) Hidden Markov Model (2-D HMM) and Multi-Layer Perceptron (MLP) OCR system for the recognition of Multi-font printed documents of varying qualities. It emphasizes on new methods proposed. First, a statistical analysis of the frequency of touching characters has been conducted, and some statistics of touching characters have been generated from real documents. Based on these statistical results which could be the first formal statistics on touching characters, a new classifier has been designed to recognize some frequent touching characters without segmentation. Second, a new hierarchical character classifier is presented to enhance character recognition accuracy. We group all characters into several categories according to character layout contextual information (Ascender, Descender and Center). Consequently we implement several independent classifiers to recognize the characters in each group.;In addition, a 2-D HMM is included in the hierarchical classifier to improve the character recognition rate, and an automatic builder of special touching character HMM is also described in this thesis. (Abstract shortened by UMI.)...
Keywords/Search Tags:HMM, Touching, Character
Related items