Font Size: a A A

The Design And Realization Of Printed Mathematical Expression Recognition System

Posted on:2006-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:X R XuFull Text:PDF
GTID:2168360155971552Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The computerized document-handling systems have been widely used, but few systems have provided functions for recognizing and understanding mathematics expressions printed in document. The system proposed in this article has the ability to recognize mathematics expressions in files scanned directly from paper and to reconstruct the recognized expressions into particular publication format such as LATEX or WORD. The system works as follows: Preprocessing. The preprocessing is a very important step before the character is recognized, its quality will influence on the quality of the recognition result directly. If the quality is nice, it is easy to carry to recognize, and the result recognition is better. On the contrary, bad quality can make it difficult not to recognize, even cause mistaken consequence. In this paper, at first, we will introduce each step and its corresponding method of the preprocessing briefly. Finally, through an analysis of daily normalization algorithm, we put forward a kind of combined one, this method is very effective for the mathematical symbol with different size and with large percentage between long and wide. Feature selection and extraction. A symbol in image file can not be classified directly, we must extract the features with extremely strong steady and classified performance from it. We introduced the methods used to select and extract the feature in the article, one is the traditional feature method based on characteristic of the mathematical symbol; and the other is to use K-L transformation to extract the whole characteristic from symbol images directly, which reduced dimensions of feature space while retaining useful information. Symbol recognition. The classifying device is a core in our system. Support vector machine is a research focus of statistical learning theory in recent years. Support vector machine classifier overcome the shortcoming of the present and commonly used pattern-recognition methods, and has improved the recognition rate effectively. In this paper, we use the multi-class support vector machine classifier to recognize the symbols, and get the rate of higher recognition. Structural recognition. Images are cut excessively in the previous processing, for the convenience of follow-up treatment, so we should combine the symbol and the character of function name. So far, the problem of understanding a complicated mathematical expression in a printed document has not been completely solved yet. We introduced a formation algorithm based on the base line. Then the structure of a recognized expression was represented by a tree structure, and the original expressions can be reproduced by using a suitable format like LATEX. The experimental results at the end of article have demonstrated the feasibility of the system. But the model we proposed still needs further improvement for commercial application.
Keywords/Search Tags:Mathematical expression recognition, Normalization, Feature selection and extraction, K-L transform, Support vector machine (SVM), Symbol recognition, Structural recognition
PDF Full Text Request
Related items