| Scene text detection and recognition task aims to locate and recognize text instances in natural scene images.Due to its wide application in image retrieval,automatic driving,intelligent robot and blind assistance system,more and more researchers have paid attention to it in recent years.Compared with document image recognition,text recognition in natural scenes is more difficult.The reasons can be summarized as follows:(1)The shape of text is irregular and the distribution of text is irregular.(2)Background interference of scene text image.(3)Image distortion and perspective problems caused by the introduction of uncertainty in artificial shooting.Although the OCR technology has become mature and has been widely used in document scanning and other tasks,but because of the high quality of the image requirements,it can not effectively recognize the text in the natural scene.In recent years,deep learning technology has made breakthrough in the field of computer vision such as object detection and instance segmentation.The research on scene text detection and recognition combined with deep learning has gradually become a research hotspot.Text detection and recognition in natural scenes are mainly divided into two sub-tasks: scene text detection and scene text recognition.Scene text detection aims to locate text instances in natural scene images,and accurate positioning is the basis of subsequent recognition tasks.Scene text recognition aims to identify the content of text areas and convert text in images into character sequences.Based on the deep learning method,this paper conducts a series of researches and analyses on the problems existing in scene text detection and scene text recognition.The specific work is as follows:In the task of scene text detection,the accuracy of text detection is not high due to the complex image background and different size of text instance.Therefore,this paper designs a scene text detection model based on the integration of attention mechanism and adaptive scale.Firstly,by introducing efficient channel attention mechanism,the characterization ability of feature extraction network is improved,and the omission rate and false positive rate of text are reduced.Secondly,an adaptive scale fusion module was designed to integrate different scale features dynamically,which enhanced the detection and localization ability of the model for different scale text instances.In the task of scene text recognition,text usually has a certain degree of bending,irregular arrangement and perspective distortion,which greatly increases the difficulty of recognition.Therefore,this paper designs a scene text recognition model based on correction network and Transformer.Firstly,the text image of natural scene with normalized input of correction network is introduced to restore the text with curved shape and irregular arrangement to regular horizontal straight text,so as to improve the text recognition effect.Secondly,the sequence recognition network is constructed based on the encoder-decoder architecture of Transformer,so as to fully excavate text context information to assist text recognition and improve the efficiency of text recognition. |