Font Size: a A A

The Design And Implementation Of Recognition System For Mathematics Expressions Printed In Document

Posted on:2006-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:B D ZhuFull Text:PDF
GTID:2168360152985435Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
The computerized document-handling systems have been widely used, but few systems have provided functions for recognizing and understanding mathematics expressions printed in document. The system proposed in this article has the ability to recognize mathematics expressions in files scanned directly from paper and to reconstruct the recognized expressions into particular publication format such as L_ATEX or WORD.The system works as follows:Merged-symbol segmentation.Due to the quality of printer, binarization etc., symbols in scanned document may be merged, and thus can not be easily recognized. In this article, we proposed a new method based on a self-organizing feature map to segment merged-symbol. By modifying the classic updating rule of self-organizing map, we obtained a network that can approximate the distribution of white-pixels between two symbols in less training time and with fewer units.Feature extraction and selection. A symbol in image file can not be classified directly, because it is not invariant with respect to image translation, orientation and size changes. In this article, we investigated three kinds of moment features that used as a shape descriptor: regular moments, Zernike moments and B-spline wavelet moments. We also used PCA neural network to select principal features, which reduced dimensions of feature space while retaining useful information.Character recognition. Recognizer is a key part in our system. Neural networks, which overcome the disadvantages of traditional pattern recognition methods, have been used extensively on OCR and have achieved higher recognition rate. In this article, we used SOFM network as rough-classifier, which classify similar symbols into same group. After that, we used BP network as fine-classifier, which identified symbols within one group. We also introduce the confidence analysis into character recognition and discuss its two main applications: the estimation of recognition rate, and the selection of rejection area for the best compromise between recognition rate and rejection rate.Expression formation. This part is arranged in appendix C in order to describe integrality of this system.Parts of the paper are research results cooperating with Mr. Hou, which are written concretely in our paper "A Segmentation Method for Merged Characters Using Self-Organizing Map Neural Networks".
Keywords/Search Tags:character segmentation, moment invariants, principal component analysis, self-organizing feature map, BP neural network
PDF Full Text Request
Related items