Font Size: a A A

Handwritten Mathematics Formula Recognition Based On Bayesian Program Learning Data Enhancement

Posted on:2019-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:L L WangFull Text:PDF
GTID:2428330566497946Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Mathematical formulas play an important role in mathematics,physics,and many other fields.With the development and popularization of smart devices such as handwriting devices and tablet PCs,many research institutions have begun to pay attention to the problem of handwritten mathematical formula recognition.The recognition method of traditional handwritten mathematical formulas is carried out in phases.First,it divides the formulas into single characters,identifies the category of single characters,and then uses the grammar rules to analyze the two-dimensional structure of the recognition results.However,there is a deep coupling between segmentation,recognition,and two-dimensional structural analysis,resulting in structural analysis modules that often contain very complex and poorly readable algorithms.There are three obvious disadvantages to the traditional phased approach: segmentation errors affect recognition performance,single-word recognition fails to consider contextual information,and two-dimensional structural analysis modules rely excessively on manual rules.In order to solve these three problems,this paper implements a handwritten formula recognition model based on the encoder-decoder framework,which can realize the endto-end recognition of handwritten math formulas.At the encoding stage,in addition to using traditional CNN to extract features from the image,this paper uses BLSTM to reencode the features on the top of the CNN layer.The resulting features can take full account of contextual information.In the decoding stage,we use the attention mechanism to complete the implicit alignment between the input features and the recognition results,avoiding the explicit segmentation of the formula,and then input the resulting intermediate vectors into the LSTM for decoder.In order to solve the problem of excessively relying on artificial rules in two-dimensional structural analysis,we use La Te X as a label for the entire formula in recognition of handwritten mathematical formulas.La Te X can both express the structure of the formula and can express the semantics of the formula.At the same time,in order to provide more data support for the model,this paper explores the generation model and uses the BPL framework to successfully generate handwritten formula data.In order to verify the recognition ability of the model,we conducted an experiment on the handwritten formula competition CROHME data set.The model of this paper can reach the second ranking on the 2014 test set.The formula recognition rate reached 41.78%,which was 4.56% higher than the third place.The accuracy rate of the formula on the standard test set in 2016 is 45.77%,and the overall ranking can be ranked third.After adding the data generated by the BPL model,the formula accuracy rates for 2014 and 2016 increased by 3.04% and 3.57%,respectively,and the character-based BLEU score could reach 74.70%,and the edit distance accuracy rate reached 79.45%.Experiments show that our model performed well on the test sets of 2014 and 2016.
Keywords/Search Tags:handwritten formula recognition, deep learning, end-to-end, bayesian program learning
PDF Full Text Request
Related items