Font Size: a A A

Gray Image-based Segmentation Method For Characters

Posted on:2005-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2178360185496931Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Optical Character Recognition (OCR) is a computer-assisted process, in which characters printed on paper or other medium are recognized. From aspect of theory research, OCR can be categorized to application of Mode Recognition and Artificial Intelligence. From aspect of application, OCR is an automated high-speed input method for information processing and is believed to be an important component of the new generation of intelligent computer interface. Research efforts have proved that the recognition rate of an OCR system is closely related to character segmentation techniques, which become the research interests for many researchers in recent years.This paper firstly presents the theory basis for character segmentation, which includes document image binarization and page analysis. Various segmentation techniques and their binarization methods are introduced. The water-drop algorithm for hand-written character recognition and its improved version are also introduced.In order to overcome the weakness of conventional segmentation algorithm in OCR, a new segmentation method based on gray-scale image is introduced. The most important features of the new method are grading of the grayscale of pixels in image and construction of a tree structures for the whole document image. By dividing of tree's branches and leaves, characters, pictures and forms can be correctly segmented. The new method also modified the water-drop algorithm for the application in gray-scale image. Experiment results showed that this method is very effective for document image with bad image quality, document with both Chinese and English characters or document with different backgrounds.
Keywords/Search Tags:Character Segmentation, Gray-scale Image, OCR, Binarization
PDF Full Text Request
Related items