Font Size: a A A

Research On Technology Of Optical Formula Recognition

Posted on:2008-04-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:X D TianFull Text:PDF
GTID:1118360302982958Subject:Optical Engineering
Abstract/Summary:PDF Full Text Request
As an effective means of automatically inputting printed formulas into computers, the technology of optical formula recognition can remedy the defect of existing optical character recognition systems that they fail to properly process mathematical formulas, and further advance the digitalization of science and technology documents. Up to date, optical formula recognition is still an unsolved problem of complex two-dimensional pattern recognition and analysis. This research work focuses on its key techniques, which include formula symbol segmentation and recognition, formula structure analysis and formula reconstruction. The main contributions can be summarized as follows:1. To meet the special need of formula recognition, a formula image pre-processing solution including image denoising, binarization, skew and distortion correction is designed and implemented to improve the accuracy of recognition. The experimental results show the effectiveness of it.2. A self-organizing segmentation algorithm with feedback mechanism based on the hybrid strategies is proposed to segment formula symbols arranged in two dimensions. It can merge or separate components and symbols according to the recognizing result. To segment touching symbols, a segmentation method of symbol touching in vertical is put forward which separates the symbols intelligently on the basis of the induction of the regularity of touching symbols. And an algorithm of segmenting diagonal or horizontal touching symbols is designed. Meanwhile, a global based approach is employed to process multiple touching symbols which appear in formulas frequently as a whole object with a corresponding feature dictionary. The experimental results show that the method has a high accuracy.3. A jumping function is defined to express formula images and a rapid extraction method of directional element feature based on the jumping function is proposed in symbol recognition, which could avoid the time consuming operation of contour extraction. Besides, a recognizer of formula symbols is designed based on hierarchical structure. Thus, both accuracy and efficiency of symbol recognition have been improved. 4. In formula structural analysis, a Maximum Matching word segmentation algorithm incorporating geometrical information of formula symbols is developed to identify symbol strings based on the formula characteristics. To tackle the problems of traditional structural analysis methods, we propose a semantic method of locating dominant baseline which could obtain the start symbol accurately through intelligent analysis of the semantic relationships of symbols. In order to find the implicit calculation relationships of symbols expressed by spatial features, we give a fuzzy classification method of symbol function which introduces fuzzy logic to identify the symbol function. To improve the analysis accuracy, we also design a dynamic approach to assign symbols to corresponding baselines in terms of the integrated features of symbols. These methods have improved both the accuracy and robustness of formula structural analysis.5. In the reconstruction of formulas, an intelligent matching algorithm is presented to determine the LaTeX parameters through the layout of formulas, and finally to realize the exact reconstruction of formulas.
Keywords/Search Tags:Optical character recognition, Optical formula recognition, Formula symbol segmentation, Formula symbol recognition, Formula structural analysis, Formula reconstruction, Jumping function
PDF Full Text Request
Related items