Font Size: a A A

Recognition Method Of Mathematical Expression Notations Based On Combined Invariant Moments And Neural Network

Posted on:2013-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z CaiFull Text:PDF
GTID:2248330371988726Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In the development of the modern science and technology, along with the rapid development of Internet technology, digital libraries and the distance education is becoming increasingly popular. In order to facilitate the computer processing and transmission through the Internet, more and more references directly released in the form of electronic documents, which contains not only ordinary text, images and graphics, but also contains a large number of mathematical formula. Because of mathematical formula using the pictures stored in a manner and take up more storage resources, so it will cause inconvenience to the online transmission. Recognition of the mathematical formula is to turn formula pictures into editable form, so that it play more and more important role. At.present, mainstream OCR(Optical Character Recognition) can’t correctly handle the mathematical formula, if the mathematical formula can be recognized automatically, it will have a huge impact to the automated storage of literature and network transmission.The mathematical formula recognition includes symbol recognition and structure analysis, As one of the basic symbol recognition, the correct or not of the symbol recognition affect the final result directly. Symbol recognition is divided into the script and printed, the paper focuses on how to recognize printed mathematical expressions symbols. After decades of development, the research of mathematical formula symbol recognition has made great progress. However, there are not too many papers about special mathematical formula of literature recognition symbols, and most of them are for one font and need to be normalized. Although the normalized reduce the sensitivity of the size of the symbols in some degree and is good to the extraction of features, Overall, it will produce some deformation or distortion when it deals with the mathematical formula symbols pictures. In order to realize the automatic recognition of the different fonts and different size of mathematical formula symbols, This paper compares the depth, and propose a symbol recognition method based on the combined invariant moments and the BP neural network, the method leaves out the normalized processing steps we usually use. On the base of extracting moment invariant features, we select the features with the principal component analysis. At last, we recognize the features by the BP neural network, which eliminate the differences of different character size and stroke degree.The main work we finished in the paper including follows:1. We only thin process the picture of the mathematical formula symbols, and keep the basic information of pictures shape, at the same time we compress the original image data so as to eliminate the redundant information. We use the scale invariance of moment invariant to extract features, and leave out the normalized processing steps of general method. Compared with the normalized refinement, the method we proposed here is not easy to generate deformation and distortion. And it eliminates the error of Mathematical formula symbols produced by stroke degree, which is more benefit for the follow-up of feature extraction.2. According to the characteristics of the numerous of mathematical formula symbols and the various kinds of fonts characteristics, we extract feature by using the moment invariant features, which meets with the minimum information redundancy and has a good ability of anti-noise property and invariance of translation, rotating and scale. The paper here use combined moment invariant features, including the HU.moment, the affine invariant moments, the normalized inertia moment and second order moment. Moment invariant features has the characteristics of good invariance and no directional, at the same time it has a certain robustness for noise, which is fit for the requirements of image processing.3. There are many features which got from feature extraction, however the more the feature’s number which take up too much data storage space, the longer the time required in learning and recognition, and irrelevant or invalid features may exist, so we need to conduct feature selection. The SVD and principal component analysis are used for feature selection, removing the redundant features and getting the most effective features which are beneficial to the following neural network’s training and learning, in order to achieve a good recognition effect.Finally, the BP neural network was used to recognize common95mathematical symbol images which appear in the book of tongji university, the rate of the recognition achieve to92.63%. The experimental results show that this method has a very good recognition effect.
Keywords/Search Tags:combined invariant moments, affine invariant moments, principal component analysis, BP neural network
PDF Full Text Request
Related items