| With the continuous development of information technology, the demand for scientific literature in the form of electronic files getting much bigger.How to achieve electronic scientific literature to get more attention and in-depth research. Mathematical formula is an important part of many of the scientific literature, the literature often plays a crucial understanding of the meaning, so electronic mathematical formula is particularly important.The subject input is an important research content of intelligent answer of middle school mathematics, the subject also contains a lot of mathematical formulas.The current OCR (Optical Character Recognition) technology can identify Chinese and English characters, and mathematical characters, but due to the complexity, diversity and symbol symbolic ambiguity mathematical formula structure and other reasons, so that OCR recognition of mathematical formula becomes get more difficult, recognition accuracy rate is low. On the other hand, because of more difficult mathematical formula entered manually, automatically and efficient mathematical formula recognition technology is a must breakthrough study.The researches on mathematical formula recognition,which is a a part of mathematical formula processing, another parts are mathematical formulas targeting, mathematical formulas analysis and mathematical formulas math output. For the printed mathematical formula recognition problem, the mainly object of study is the mathematical formula image. Structure mathematical formula is not a simple one-dimensional, but a complex of two-dimensional; characters appear in different locations meaning expressed is not the same, there is no uniform character size; character mathematical formulas contained in the digital, alphabet, arithmetic symbols, a wide range. These reasons led to the mathematical formulas symbols segmentation and recognition have some difficulty. The main two parts of the mathematical formula recognition system are mathematical formula character segmentation and character recognition.In front of the image segmentation mathematical formulas, we do image preprocessing.Preprocessing including filtering, binarization, tilt correction and refinement. Mathematical formulas are used in the method combined by projection segmentation and connected domain segmentation, the designed algorithms can be neatly segmented single symbol. Normalize the single symbol obtained by dividing to make full preparations for the subsequent feature extraction and recognition. For the current low accuracy of recognition and difficult recognition of common confusion symbols, extracted three groups of representative characteristics are extracted:vertical and horizontal cross-sectional characteristics, grid characteristics based on feature pixels and holes characteristics.There are certain complementarity between characteristics. Input these characteristics into CRFs for training, learn to give the corresponding CRFs, and test recognition on data sets. Rate of recognition based on feature fusion symbol recognition can reach 97.1%. The research has better recognition performance than the traditional identification methods. |