Font Size: a A A

Method Research And System Design Of Printed Mathematical Formula Recognition Based On SVM

Posted on:2016-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:S Y BaiFull Text:PDF
GTID:2298330467489695Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Optical Character Recognition(OCR) is a recognition technology, which is widelyapplied in the banks, post and telecommunications, logistics and other fields in recent years.Its purpose is to change the printed or handwritten characters,which are the inputs of thesystem and in the image format,into editable symbols. At present, the recognition of Englishletters,Chinese characters and Arabic numerals,which are included in the printed documents,has reached a higher level, however, due to the numberous varieties,huge changes and thecompliacted structures,achieving correct and rapid recognition rate is difficult. So, it needs toexplore more effective recognition methods.This thesis aims at several key problems of the printed mathematical formularecognition.Focus on correcting the skew images rapidly and accurately, segmenting adhesivesymbols and recognizing symbols based on SVM multiple classifiers.This thesis proposes askew correction method based on connected area analysis and the Hough Transform toimprove the efficiency and accuracy of detecting angle of layout. This method firstly estimatesskew angles through the connected areas. Then, divides text regions based on longerconnected areas. The finally precise skew angles can be achieved through Hough Transformon the layout areas,which are detected by the edge, in different angle steps. At the same time,locates positions to be segmented, then verifies the segmented symbols based on recognitionresults.Due to various kinds of formula symbols, in order to reduce the burden of classifiersand improve its accuracy, it needs to select and classify the features of formula symbols indetail. This thesis creates multiple classifiers with coarse classification features and fineclassification features based on the above conditions. In the fine classification stage, this thesisuses one to many method to replace one to one method in the DAG-SVM training model. So,it can improve training efficiency of classifiers. And it uses separability among classes toreadjust the nodes in the DAG-SVM. So, it can reduce the influence of error accumulation tothe recognition results. According to experiments and analysis, the methods proposed in thisthesis can detect the angle of layout efficiently, segment the adhesive characters accurately andrecognize the formula symbols effectively.Based on above methods, this thesis designs and implements a printed mathematical formula recognition system with VC++. Use the formula images included in the documents asthe inputs of the system, with the steps of layout analysis, formula images preprocessing,formula symbols recognition and the analysis of formula structure, output the results in Latexform. According to the analysis of recognition results, using the improved SVM classifiers torecognize the mathematical symbols can achieve94.7%,which is higher than the recognitionrates of other existing SVM classifiers.
Keywords/Search Tags:Formula recognition, SVM, Skew correction, Adhesion segmentation, Featureextraction
PDF Full Text Request
Related items