| Mathematical expressions,as a standardized language commonly used all over the world,have a wide application prospect in the fields of intelligent homework correction and intelligent teaching.Handwritten mathematical expression recognition has received attention from many research scholars in recent years,and the field still faces great challenges because of the complex two-dimensional structure of handwritten mathematical expression,large differences in image size,and variations in handwriting styles.The encoder-decoder framework based on the attention mechanism provides an image-to-sequence method to decode handwritten mathematical expression images into La Te X sequences,which has made a breakthrough in handwritten mathematical expression recognition.However,due to the localized nature of convolutional neural networks,the neural network cannot capture the relationship between two objects at a longer distance when acting as an encoder.For handwritten mathematical expression recognition tasks that require capturing the relationship between symbols,the local nature of convolutional neural networks is clearly detrimental to recognition.Similarly,convolutional neural networks are weak for structure extraction due to the characteristics of convolutional translation invariance.To address the above problems,this paper alleviates the inability of convolution to model long distances by acquiring global information and fusing global information with local information,while using two methods to achieve enhanced structure perception of the model.In this paper,the following improvements are made to the encoder-decoder model approach based on the attention mechanism.1.In order to alleviate the inability of convolution to capture the relationship between distant objects,this paper proposes a method for fusing local and global features.The method uses an encoder with self-attention to extract global features of the image,and then fuses the extracted global features with the local features extracted by the convolutional neural network.Experiments show that compared with the original model,the method improves the recognition accuracy by 2%,3% and 1% on the CROHME2014,CROHME2016,Off Ra SHME datasets.2.To improve the model’s perception of the two-dimensional structure of the expression,this paper first defines the two-dimensional structure relations and introduces the extraction method,and then proposes two methods to help the model recognize the structure based on the predefined two-dimensional structure relations.The first approach uses the connectionist temporal classification(CTC)method to achieve structure-relationship sequence recognition to guide handwritten mathematical expression model decoding,and the second approach achieves weak supervision of model recognition of structure through the structure-relationship distribution case.Experiments demonstrate that the recognition accuracy of both methods is significantly better than the other methods.The sequence-supervised method achieves recognition accuracies of 51.87% and 48.95 on the CROHME2014 and CROHME2016 datasets,and the structural weakly supervised method achieves the recognition accuracies of 55.13% and52.44% on the CROHME2014 and CROHME2016 datasets. |