Font Size: a A A

An Encoder-decoder With Attention Based Method To Handwritten Mathematical Expression Recognition

Posted on:2020-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:W B XiaoFull Text:PDF
GTID:2428330572479130Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Mathematical expression recognition is a key technology in the field of electronic transcription of paper documents.At the same time,with the popularization of touch devices in recent years,online handwritten mathematical expression recognition technology has become more and more important.The development of this technology can improve human-computer interaction,especially in the fields of digital teaching and mathematical document editing,will bring great convenience.Mathematical expression recognition is a kind of optical character recognition problem,but it has its own particularity.The particularity of the mathematical expression recognition problem is reflected in the fact that there are a lot of difficulties in writing ambiguity,character ambiguity,segmentation ambiguity and structural ambiguity.These difficulties make this problem a lot of challenges,so traditional optical character recognition technology cannot be used to solve this problem.According to the representation format of the mathematical expression,there are two types:printed and handwritten versions.Handwritten expression introduces more ambiguity,so it is more difficult for recognition.This paper will mainly focus the recognition of handwritten mathematical expression.According to whether it is real-time/online data,the handwritten math expression can be divided into online and offline types.The so-called online refers to the expression is represented as dynamic trajectory coordinate information.The so-called offline means that the expression is represented as static picture pixel information.Note that the static picture information can be obtained according to the dynamic trajectory information,and vice versa.Due to the popularity of portable electronic touch devices,the problem of online mathematical expression recognition problem has attracted more and more attention from researchers.Therefore,the focus of this paper is on the recognition of online handwritten mathematical expressions.In this paper,we will use CROHME dataset and eventually develop a system for transcription the mathematical formula handwritten trajectory information into the LaTeX symbol sequence.The online handwritten math expression recognition problem can be regarded as a sequence-to-sequence transcription problem,i.e.,a multi-modal sequence-to-sequence learning problem between a trajectory coordinate sequence and a LaTeX symbol sequence.In recent years,researchers have used encoder-decoder architecture to solve multi-modal sequence learning problems,such as image caption,speech recognition and other issues,and have achieved impressive results,so this paper will use the encoder-decoder architecture to handle the online handwritten mathematical expression recognition problem.Considering the characteristics of the handwritten mathematical expression recognition problem,this paper uses the LSTM as the encoder and the decoder architecture,and incorporates the latest research results in the field of machine translation,such as attention and coverage mechanism.In addition,according to the large-scale model practical experience in the field of machine translation,this paper carefully adjusts hyper-parameters of the encoder-decoder model,such as the depth of LSTM,hidden unit structure,hidden unit dimension,and word embedding dimension.The encoder-decoder model achieves a expression recoginition rate of 50.57%over the other teams on the test set of the 2016 CROHME competition.Recent studies have shown that pre-training language models can greatly improve the performance of various natural language processing tasks.Therefore,this paper uses the symbol segmentation information in the CROHME dataset and the LaTeX corpus to pre-train the encoder and decoder respectively.Then transfer them into the encoder-decoder model for further fine-tuning.Since the encoder and decoder components in the encoder-decoder architecture essentially solve the task of trajectory information representation and LaTeX grammar learning respectively,the pre-training auxiliary tasks designed in this paper can further improve the expression recognition rate to 58.76%.
Keywords/Search Tags:Handwritten Mathematical Expression Recognition, Encoder-Decoder, Pre-training
PDF Full Text Request
Related items