Font Size: a A A

Degradation Chinese Scene Text Recognition Algorithm Based On Multitask Learning Mechanism

Posted on:2022-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:J L ChenFull Text:PDF
GTID:2518306476990849Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Text image recognition technology has been widely deployed in the field of offline handwritten character recognition and scanned document recognition with simple background,unified font style and other standardized typesetting,and it is affected by factors such as noise,motion blur,low resolution,shooting angle,light,etc.The recognition of low-quality natural scene text images is still a technical difficulty.This article will study the defects in the lowquality Chinese scene text image recognition algorithm.The specific research content is as follows:(1)At present,the identification problems of low-quality images tend to be solved from the perspective of image reconstruction,but this approach fails to take into account the robustness and generalization of the model.Therefore,based on the multi-task learning mechanism,this thesis constructs a network to solve the problem of low quality text recognition from the feature expression level.The super resolution module and the text recognition module work together to make the feature sharing layer obtain more robust feature expression ability,and the super resolution module does not participate in the inference process of the network.The experimental results show that compared with SEED and TSRN,the accuracy of the proposed network can be improved by 16.48% and 8.77% respectively,and the model shows stronger generalization ability.(2)Super-resolution reconstruction networks often construct loss functions from pixel,style,content and other feature levels,but often result in the loss of high-frequency information of images and excessive smooth texture,which is not beneficial to the learning of character structure information in text images.Therefore,this thesis uses image gradient information to guide feature sharing layer to learn character edge and character geometric structure features.The reconstruction results show that the text recognition accuracy is improved from 69.12% to71.25% with the addition of gradient loss function in the overall network design.(3)Deeper or wider deep learning networks often only capture features at a single scale,ignoring the inherent correlation of features between layers.Therefore,in this thesis,a feature sharing layer based on pyramid self-attention unit is constructed by using non-local operation and feature pyramid idea.This module realizes the capture of remote information by combining feature mapping of multiple scales.The experimental results show that the addition of the pyramid self-attention module can improve the recognition accuracy of the proposed algorithm from 70.17% to 71.25%.(4)To solve the problem of text recognition in Chinese scenes,this thesis designs a feature sequence decoder based on Transformer model in order to improve the accuracy of text recognition and the ability of model parallel computing.Multi-head attention mechanism is used to realize parallel computing and feature fusion of feature weight allocation,so that the network can capture the global information of the image and alleviate the phenomenon of remote information loss.The experimental results show that compared with the current CTC and Attention methods,the accuracy of low quality text recognition can be improved by 1?2%,respectively.
Keywords/Search Tags:Scene Text Recognition, Multitask Learning, Super Resolution, Non-Local operation, Transformer
PDF Full Text Request
Related items