Font Size: a A A

Research On Mathematical Formula Recognition Algorithm And Its Application In Book Content Recognition System

Posted on:2021-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q JiaFull Text:PDF
GTID:2518306575967199Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of social information technology,electronic books have become an important knowledge carrier.Mathematical formula is an important page information of books,and its recognition not only needs to recognize characters but also needs to realize formula recognition structure.Mathematical formula recognition has always been a difficulty in the electronic entity books.This thesis designs a method of mathematical formula recognition.The method uses convolutional neural network structure for character recognition,and it uses the scope and center line of operation symbols to realize the recognition of formula structure.In this thesis,three steps are adopted to realize the recognition of mathematical formula: formula character cutting,character recognition and formula structure recognition.Firstly,the method of combining connected domain with rules is used to realize the cutting of common mathematical formulas.The character recognizer designed in this thesis is composed of three convolution layers and three fully connected layers.The size of convolution kernel is 3 × 3,and the number of neurons in the final fully connected layer is 275.The accuracy of the designed character recognizer can reach 98.90%.In the aspect of formula structure identification,a method based on operation symbol and center line is proposed.Operation symbols can be divided into three categories,namely semicolon,special symbol and binary operation symbol.Special symbols include root sign,sum symbol and integral symbol.These operators have multiple scopes.In this thesis,the algorithm firstly distinguishes and identifies the subformulas within these scopes,and then directly sends the recognition results into the Latex language expression of the operators.After that the method begins to identify the characters' positional relationship of superscripts and subscripts.Based on the fact that the characters are constrained by the four-line grid when writing,the concept of centerline is proposed,and common characters are classified into upward type,downward type,fully occupied type and center type.The method can identify the structure of common mathematical formulas comprehensively.According to the characteristics of books,the thesis designs and implements the book content recognition system,which uses the client and server architecture.After page preprocessing and page image segmentation on the client side,the segmentation results are sent to the server.After classification by page target classification model,the text lines and formulas are identified.Then the server sends the page target recognition results to the client,and finally generates an edited file in the client file.CRNN model is used to text recognition,neural network structure is applied for character recognition,and the method proposed in this thesis is applied to recognize formula structure.Finally,the book content recognition system is realized.
Keywords/Search Tags:Formula recognition, Character recognition, Structure recognition, System design
PDF Full Text Request
Related items