Font Size: a A A

Research On Recognition Technology And System Design Of Printed Mathematical Formula

Posted on:2021-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y R LinFull Text:PDF
GTID:2428330611466803Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
With the continuous development of Internet and information technology,electronic documents obtain the characteristics of wide application space for its convenient and quick.At the same time,with the pattern recognition technology matures,there are more requirements be put forward in terms of the recognition of electronic documents and intelligent analysis.At present,the OCR(Optical Character Recognition)technology has been widely used in intelligent analysis of electronic documents,can efficiently identify a large number of Chinese characters and English characters,but still can't realize the recognition of mathematical formula.Through the study of printed mathematical expressions recognition key technology,the paper builds a set of printed mathematical expressions recognition system.First of all,through a variety of binarization methods to experiment,compare the segmentation effect,and finally use the global threshold method.The characters are segmented by projection segmentation and connected domain segmentation algorithms.Secondly,in character recognition,a set of mathematical formula character template library is constructed.Fully considered the common fonts,font size,italics,bold italics and other types of characters,including upper and lower case English letters,numbers,Greek letters and common mathematical symbols,a total of 191 categories,22242 characters.The template matching method was used for identification.The public data sets Infty-CDB-3-B and Infty-MDB-1,as well as the mathematical formula data sets intercepted in mathematical literature were tested,and the average correct recognition rate was 97.10%.The multi-layer classifier based on the number of holes and the aspect ratio is used to optimize the template matching method and reduce the computational complexity.Using a support vector machine-based classifier algorithm for performance comparison,the correct recognition rate is 95.43%.Among them,the incorrect recognition of the character '.' accounts for 79.44% of the incorrectly recognized characters.The wrong characters of the two classification algorithms are concentrated on the number ‘1' and the lowercase letters ‘l',‘o' and ‘0',upper and lower case letters ‘Oo','Ss',and ‘Vv'.In terms of structural analysis,this paper establishes a formula analysis system based on character trees.Establish a character structure tree,based on joint characters,positional relati-onships and specific formula types,establish formula reorganization rules,through the horizontal and vertical distribution of mathematical formulas,construct a mathematical formula structure analysis process,and establish an overall analysis based on the character structure tree,the system The algorithm is simple and avoids backtracking.The experimental results of formula reorganization show that the correct recognition rates of upper and lower subscripts,combined characters,root type,fraction type,and upper and lower structure mathematical formula types are 86.77%,95.37%,100%,98.97%,90.48% respectively.Finally,based on the development environment of MATLAB and GUI,the paper establishes a set of mathematical expression recognition system for printing.Compared with the existing formula recognition software Infty Reader and Math Pix,the results show that Math Pix has the best performance and recognition effect.This system is superior to Infty Reader in character recognition,joint symbol,and upper and lower structure type recognition.And the calculation speed is inferior to Infty Reader.
Keywords/Search Tags:The template matching method, Structural analysis system, Mathematical formula recognition
PDF Full Text Request
Related items