With the development of information science and technology,people are more and more inclined to use computers as information processing tools.This method can not only save manpower and time,but also effectively reduce human operation errors.Informatization of entity materials is an important application area for the automatic processing of computer information.In the early days,due to technical restrictions,many materials were recorded in paper format,which had high storage costs,limited storage time and high costs of sorting and searching.The education field is a major branch of informatization of entity materials.Examination is an effective means of teaching evaluation during the teaching process.Informatization of test papers not only helps the electronic storage of test papers,but also can quickly and effectively analyze the content of test papers.At present,OCR technology has a better effect on printed text recognition,and there is still a lot of research and application space in the recognition of handwritten numerals and mathematical formulas.This thesis used digital image processing technology and deep learning to take the test paper pictures as the research carrier for handwritten numeral recognition and mathematical formula recognition.The main research contents include two aspects.(1)First of all,the thesis used the information about the anchor points of the test paper and some fixed formats on the test paper layout to extract and correspond the question number and corresponding answer area in the test paper.The process included five steps: anchor point recognition,anchor point coordinate ordering,anchor point coordinate sequence division,answer area division,and question number extraction and identification.Then the handwriting score of the test paper was extracted through steps of color space conversion,contour extraction filtering,hyphenated character segmentation,and image normalization,and the handwriting score was identified by CNN.For the common digital recognition model Le Net-5,it performed well on the MNIST dataset,but the recognition accuracy was lower in practical applications due to the regional differences in writing.Based on the idea of transfer learning,this thesis deepened the network level to conduct staged training on the MNIST data set and the test paper handwriting score data set,which improved the practical application effect on the premise of ensuring the universality of handwritten numeral recognition.Finally,this thesis combined the answer area information and handwriting scoreinformation to design an automatic score statistical process to apply handwriting numeral recognition to actual teaching work.(2)Mathematical formula recognition was studied from both traditional methods and deep learning methods.The traditional method recognized formulas through character recognition and structural analysis.In this thesis,the character recognition used connected domains to divide characters.It used the common font symbol database in AMSFonts as a template.It was characterized by normalized central moments of inertia,circular topology,and Hu invariant matrices to ensure characters recognition accuracy under zooming,panning,rotating,etc.The structural analysis drew on the Te X typesetting system,with the box as the basic unit,and analyzed the corresponding merging rules by formulating common formula structures.The deep learning method based on Encoder-Decoder with Attention model to recognize overall formula.This thesis used the Inception v3 network to enhance feature extraction in the Encoder part and added position embedding to retain the image feature position information.Then it compared the different replacement schemes of Encoder,Decoder and Attention to get the best recognition model,the final model had a BLEU score of 88.63% on the im2latex-100 k dataset and the subject dataset. |