Font Size: a A A

Research Of Automatic Recognition In Printed Mathematical Expression

Posted on:2008-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:S C TongFull Text:PDF
GTID:2178360215983336Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of Internet, using Internet to disseminate and exchange information is much more frequently. Digital library and distance learning through Internet are becoming hot research areas.So it is a crucial problem to make information resource electronic. The existing technology of OCR has a satisfying recognition effect for Chinese and English characters as well as digital symbols, but it is not efficient very much in recognition of mathematical expressions. Because of two-dimensional features of mathematical expressions and the variety of meanings of expressions, it has great difficulty in segmenting and structural analyzing of mathematical expressions. In order to convenience the readers and improve the using efficiency of literature, the system in this article could convert and reconstruct mathematics expressions in scanning the files directly into particular publication format,such as LaTeX.The major work has been completed as follows:1,A function is designed to realize the function of selecting the image of mathematical expression conveniently and rapidly.2,The function of image preprocessing is to remove noise, which can reflect the nature portion of the symbols prominently and then correctly identify image content.3,Image recognition is a key problem to feature extracting and selecting of image recognition.Based on the characteristics of mathematical symbols, statistics features and structural features are extracted from the mathematical symbols features to compose set of 45 dimensions,which has no restrictions of fonts, and great impact of fonts has been on pixel feature set.And the feature set is optimized by K-L orthogonal transformation in order to eliminate redundant information. At last, dimension 39 feature set are determined, which can achieve more satisfactory recognition results, by practice and comprehensive comparison.4,Support Vector Machine(SVM)is the latest machine learning methods which is developing on the basis of the statistical learning theory. Symbolic recognition of a mathematical expression is a limited sample of the various types problem.In this paper, symbols recognition was adopted the method of two types combinations classification which namely one against one. By a lot of experiments,ideal parameters are found and the experimental results are compared with other relevant documents,its correct rate increases than that of before.5,Structural analysis is the biggest difference between mathematical expression recognition system and characters recognition system, and it is also one of the difficulties in mathematical expression. Because of the over-segmentation made in a symbol of image segmentation in pretreatment, some rules are set up to detect and merge multiple structure characters or function characters.Structural analysis based on baseline algorithm is used, and the storage form of the tree and top-down analysis strategy are used, and all this good qualities are simple thinking, easy implement,high recognition accuracy and high speed,moreover it suit real-time identification of mathematical expression system.6,TeX system is recognized as the best mathematical formul typesetting system internationally,and LaTeX bases on TeX. Therefore, the mathematical expression of image is transformed into LaTeX form in the paper.7,The whole functional of system is actualized under the environment of Visual C++ 6.0.It can hold rapid speed, simple and convenient interface.
Keywords/Search Tags:Printed Mathematical Expression, Feature Extraction and Selection, K-L Transform, Support Vector Machine, Baseline Algorithm Structural Analysis, LaTeX
PDF Full Text Request
Related items