| Handwritten text recognition(HTR)is a difficult sub-task of optical character recognition task,while English handwritten text recognition is a sub-task of HTR.At present,the mainstream text recognition algorithms are divided into three categories: based on connectionist temporal classification(CTC)mechanism,based on attention mechanism and based on aggregate cross entropy(ACE)mechanism.Based on the Attention mechanism,there will be the problem of attention drift.In the case of long text,the problem is more obvious,and the number of parameters is large,and the reasoning speed is slow.The ACE-based mechanism is very sensitive to the text scale and is not easy to converge during training.The CTC mechanism does not have the above problems,and is more suitable for HTR in the case of long text.However,the above text recognition algorithms are faced with three major challenges.The first is the lack of data.The data set required for handwriting recognition usually needs a lot of manual annotation,and it is difficult to obtain enough data for model training,resulting in limited model performance.The second is the large quantity of model parameters.The existing handwriting recognition models usually have tens of millions or even hundreds of millions of parameters.This huge computing resource has become one of the bottlenecks limiting the performance and practical application of the model.Third,there are many similar characters in English letters.For example,"O" and "0","l" and "1","S" and "5" are similar in shape and difficult to accurately distinguish.This paper presents three studies addressing the problems associated with existing handwritten text recognition tasks:(1)Aiming at the problem of large parameters in handwriting recognition models,this paper introduces the lightweight backbone network Mobile Net V3 for image classification,and optimizes its network architecture in order to adapt it to text recognition tasks.In addition,this article has also done sufficient ablation experiments on the short-term and short-term memory neural network,ultimately making the text recognition model take into account both accuracy and parameter quantity.(2)To solve the problem of lacking handwritten data,this paper proposes a data augment method for handwritten text recognition(HTRAug).Sufficient ablation experiments have been conducted on data enhancement to fuse data enhancement methods that are conducive to handwritten text image recognition,maximizing the expansion of the handwritten dataset and improving the generalization ability of the model.(3)Due to the problem of similar characters in handwritten data,an enhanced CTC method is proposed.This method combines Center Loss and CTC Loss,and initializes the Center Loss parameters through a trained recognition model.Finally,experiments were conducted on the IAM dataset,and the recognition error rate and model parameter count were 4.93% and 4.2M,respectively,which were 1% higher than the latest HTR research work accuracy and reduced the parameter count by more than 2M. |