Font Size: a A A

Studies Of OCR Technology For Degraded Document Images

Posted on:2006-07-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y F SunFull Text:PDF
GTID:1118360185495686Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In this dissertation, the OCR technology for degraded document images is deeply studied in three levels-theories, algorithms and applications. This dissertation mainly covers following topics:First, the relationships among OCR for degraded document images and the cognition of human beings, the classical AI, and the binary recognition technology are analyzed. Some novel viewpoints which are different from the traditional conception are proposed and the theory frame of OCR technology for degraded document images is established.With the guidance of above theory frame and consideration of the characteristics of degraded document images, an OCR flow for degraded document images is established with highly efficient algorithms put forward respectively in all the main steps of the flow. Those algorithms include following:Connected-component-based Character Segmentation Method by Using Multi-layer Structure: In order to overcome the weakness of conventional segmentation algorithm in OCR, a new segmentation method for degraded document image is proposed. The most important feature of the new method is to find the optimal threshold for segmentation according to the varying law of the attributes of connected components. First, the whole document image is constructed into a multi-layer structure by using the grading connected components. Then, those connected components on the main layer are merged or split by some heuristic rules. The final connected components are expected segmentation results. Experimental results demonstrated that this method is more effective than the traditional method.Automated Seeded Region Growing Method for Binarization Based on Topographic Features: This is a new binarization method for the individual character gray-scale image. It does not have the explicit threshold. It searches the print and background pixels directly by using a modified seeded region growing (SRG) technique. This method applies higher-level knowledge to the entire algorithm process. First, seed pixels are selected automatically according to their topographic features; then regions are grown which is controlled by new weighted priority until all pixels are labeled black or white; finally, noisy regions are removed based on the stroke width feature. These features contain essential structural information; hence...
Keywords/Search Tags:OCR, Degraded document, Pattern recognition, Image processing, Gray-scale Image, Binarization, Segmentation, Recognition, Connected component, Seeded region growing, Topographic feature, Similar Chinese character
PDF Full Text Request
Related items