| As the carrier of high-level semantic information,the text is extremely valuable for scene understanding.Therefore,text recognition has broad application prospects in the fields of automatic driving,product search,instant translation,online education,and so on.In recent years,with the development and application of deep learning,text recognition algorithms have made great progress and achieved good recognition results.However,due to the rich text shapes,diverse layouts,complex backgrounds,and numerous interferences in natural scene images,it is still a challenging problem to recognize irregular text recognition in complex scenes.This paper aims to research and implement a highly robust natural scene text recognition algorithm for irregular text with complex backgrounds.The main work and contributions of this paper are as follows:1.Proposing a scene text algorithm based on attention mechanism.Regarding the problem of poor robustness on irregular text with complex backgrounds,the main reasons are as follows:(1)The problem of lacking receptive field leads to insufficient representation ability of the feature extraction network,which cannot capture the layout of irregular text and the position of characters;(2)The extracted features are mixed and indistinguishable,which makes the algorithm vulnerable to background interference.To solve above problems,this paper proposes an attention-based text recognition algorithm.Firstly,according to the difference of the backbones,two methods of expanding the receptive field are introduced to ensure that the spatial information features can be fully utilized.The backbone based on CNN achieves the purpose of expanding the receptive field by replacing standard convolution with deformable convolution.Transformer-based backbone achieves the effect of the global receptive field by using a visual Transformer structure.After experimental verification and comparison,this paper chooses a Vision Transformer as the backbone network.Then,given the characteristics of text recognition are different from classification tasks,this paper proposes introducing a character feature filtering module based on channel-attention mechanism in the decoder for the first time.In this way,when cyclic decoding,the features of each character are filtered out from the mixed features.After experimental verification,the proposed EAF improves the recognition performance with a small-time cost.EAF has reached the SOTA on regular and irregular scene text recognition benchmarks.Compared with the previous methods,the accuracy of EAF on irregular datasets CUTE80,IC15 and SVTP are improved by 6%,3.3%,and 3.3%,respectively.2.Improve the Chinese spelling check algorithm based on BERT.The performance of text recognition algorithms is closely related to the quantity and quality of datasets and the training cost is high.In addition,it is easy to misrecognize a character due to the similarity of Chinese characters.To solve above problems,this paper proposes to apply the Chinese spelling check algorithm for the post-processing stage of STR task.The prediction result of the detection network is very important to the error correction result of the subsequent correction network.However,the detection network based on the bidirectional GRU of the original network has a problem that the long-order dependency cannot be captured.Therefore,this paper adopts the detection network based on Transformer Encoder,and the performance is improved in both detection and correction.The experimental results on three benchmarks verify the idea proposed in this paper.At the same time,compared with the original network,the F1 score has increased 1.5%and 1.6%respectively on the error detection and correction task. |