With the arrival of the information age. contents of the information are more and more complicated. The Chinese-document information is not only contained Chinese. but also contained English, kinds of formulas, tables and images. To input the details of the information into computer quickly and efficiently is a key task in information processing. Chinese character recognition system is the product for recognition inputting and format changing. Yet. operating recognition system can not realize formulas recognition. So. it has important practical value and theory meaning for designing the Chinese document recognition system contained formula character recognition function.This thesis based on existing Printed Chinese document Recognition System and makes contribution and improvements on the following aspects:Firstly, after the processing of the original document image, we compared algorithms based on histogram thresholding segmentation with one based on Ostu when obtain binary image. The findings show that Ostu algorithm over-performed its counterpart so that we employ the Ostu algorithm.Inaddition, it is inevitable that there will be an tilt comparing to the copy and the original document. In order to minimise the tilt, we apply the procedure tilt detection and converse way to rotate the text image. By doing this, the tilts between-5°and 5°an be rotate faster and more efficient.Then, form detection and extraction is applied. Here, we take advantage of Mathematical morphology transform on form detection and abstraction. Moreover, the extracted forms will be enhanced by refining the form lines and combining straight lines.The last part is Chinese characters detection. A multi-classifier based approach that uses multiple features of text image is applied to build 2-level font templates.In summary, Form recognition in the original Printed Chinese document Recognition System, realizes Form Processing, Tilt Correction and Detecting Extraction, and improve the template library for the expansion of Chinese characters.In this paper, for the more formal, standard books, newspapers and magazines of the image acquisition and recognition. The contents of this paper is to study the past, students with the research group on the basis of printed Chinese document recognition system function to improve.The focus on the table to achieve detection and extraction, and to expand the corresponding Chinese character library. Compared with the mature technology to solve the automatic processing of printed documents can not be achieved in the table identify the problem, is printed with identification formula document recognition technology system. which improves the utilization of the original document and facilitate the hiring of the formula. Inquiry, the development and dissemination of technology has far-reaching significance. |