Text recognition algorithms have penetrated into every aspect of people’s life,for example,during the epidemic,medical personnel can photograph nucleic acid test sheets to obtain test results and automatically input them into the system,highway toll station with ETC can capture license plate information without staff,couriers can photograph express order to obtain corresponding information through software when forwarding express delivery.In addition,there are many applications such as card recognition,document recognition and translation,which show the great value of research on text recognition.This paper studies text recognition algorithms in Chinese natural scenes,which facilitate the extraction of useful information from them for processing and reusing to promote real industrial applications such as intelligent navigation robot research and ground implementation for the blind.In this paper,we choose CRNN and SVTR as the benchmark models,and propose four model optimization schemes for the purpose of improving text recognition accuracy.The main contents are as follows:1.When choosing the most classical and widely used CRNN as the benchmark,the visual feature extraction module in CRNN is replaced with ResNet18 and ResNet34 to extract more complete visual features through the residual connection structure in order to reduce the information loss in the visual feature extraction process and improve the recognition accuracy.The experimental results show that the recognition accuracy of CRNN model is improved from 54.9% to 56.5% when using ResNet18 for visual feature extraction,while the accuracy is improved to 58.6% when using ResNet34.2.In order to tackle the problem that the SVTR-T model tends to fall into the local optimal solution or hover around the optimal solution when the gradient descent method is used to train the SVTR-T model,the paper designs a circular cosine learning rate decay strategy to assist the model to jump out of the minimum value when applying the newly released 2022 SVTR-T with the best recognition effect as the benchmark model.Among them,the cyclic cosine learning rate decay strategy improves the model recognition accuracy from 68% to 70.2%.3.In SVTR-T for natural scene Chinese recognition experiments,in order to solve the problem of long training time and slow convergence of the model.In this paper,a pretrained model is designed to accelerate the convergence of the model by using the optimal model parameters from the previous training as the initial weights for the subsequent model training.The experiments show that the text recognition accuracy of SVTR model is improved to 69.8% after adopting the pre-trained model.4.In SVTR-T experiments,in order to alleviate the model overfitting and solve the problem of insufficient samples in some scenes,this paper designs to add data augmentation methods,including the common data augmentation and the elastic transformation data augmentation.The common data augmentation method can change the feature distribution of samples and reduce the interference of background noise to the model,and the elastic transform data augmentation method can change the morphology of characters in text images,increase the number of distorted samples and reduce the dependence of the model on the character style.The experiments show that the text recognition accuracy of the model is improved to 71.2% after adding the common data augmentation,and improved to 72.9% after adding the elastic transformation data augmentation method on top of it.5.In order to further improve the text recognition accuracy of the SVTR-T model,the cyclic cosine learning rate decay strategy,the pre-trained model and two data augmentation methods were used in combination to optimize the model,and the experimental results showed that the text recognition accuracy of the test set reached73.9%,with a total improvement of 5.9%,which is remarkable.This result is even 1.8%better than the largest model in the SVTR series,SVTR-L. |