With the development of the automatic processing of the document images, image analysis and layout analysis are of great concern in recent research. Starts from engineering applications, the thesis propose the development of Chinese document image layout analysis and OCR under the specific circumstances. First of all, achieve the Chinese document layout analysis system based on the text field is preferred, document image preprocessing, form extraction, image extraction and text extraction algorithm were studied. Then, achieve the character recognition system, which is used to OCR print digits in document images, based on the feature extraction of Gabor transform. The main contents are as follows:1.Current research status in the document image layout analysis, along with the problems and difficulties faced, were exposited in detail.2,This paper has done an in depth study on the methods and processes of preprocessing of Chinese document,an effective image binarization algorithm based on Huang’s fuzzy theory, document image skew detection and correction and a character recognition algorithm based on Gabor filter.3.The further study of the form line extraction method based on run-length smoothing, text images extraction based on connected components analysis and text extraction algorithm based on contour extraction. Implemented the Chinese document layout analysis system based on the text field preferred.4.The key technologies being used to implement print digits recognition were provided in detail. |