Font Size: a A A

Auto STR:Efficient Backbone Search For Scene Text Recognition

Posted on:2021-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2518306107968009Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Scene text recognition(STR)is very challenging due to the appearance and layout of text in scene text images are diverse and often accompanied by the interference of rich background noise.The current state-of-the-art of scene text recognition methods usually consist of three modules:(1)Pre-processing module that rectifiy irregular scene text pictures;(2)Feature extraction module that extracts feature sequence from input rectified text image;(3)Feature translation module that maps an image feature sequence into a text character sequence.The community has paid increasing attention to boost the performance by improving the pre-processing image module,such like rectification and deblurring,or the sequence translator.However,for a basic and important module in scene text recognition algotithms,another critical module,i.e.,the feature sequence extractor,has not been extensively explored and discussed in depth.The main reason is that manually designing a feature extraction network(deep convolutional neural network)requires very strong domain knowledge and a large amount of experiments and computing resources.Therefore,the feature extraction modules use in most scene text recognition methods currently directly used object classification task design structure.However,there are differences between object classification tasks and other vision tasks,which may lead to sub-optimal situation.Inspired by the success of neural architecture search(NAS)technology in many visual tasks,such as large-scale object classification,image segmentation,object detection,etc.,and can identity comparable or even better architectures than manully designed ones.In this work,we propose automated STR(Auto STR)to search data-dependent backbones to boost text recognition performance.We first analyze the feature sequence extractor for the scene text recognition task and design a general domain-specific search space for STR task,which contains both choices on operations and constraints on the downsampling path.Then,we propose a novel two-step search algorithm based on the search process of convolution operations and the search of feature downsampling paths,which decouples operations and downsampling path,for an efficient search in the given space.Experiments demonstrate that,by searching datadependent backbones,Auto STR outperforms the state-of-the-art approaches on standard benchmarks with much fewer FLOPS and model parameters.
Keywords/Search Tags:Scene Text Recognition, Neural Architecture Search, Convolutional Neural Network, Automated Machine Learning
PDF Full Text Request
Related items