Text plays an important role in human-computer interaction,and with the rapid development of intelligent robots,unmanned vehicles,and medical diagnosis,text detection and recognition has become an important way to locate and understand object information.At present,due to the development of deep learning,text detection and recognition has made great progress,but the text detection and recognition effect is still unsatisfactory in streetscenes due to lighting conditions,blurred text,distorted text and extreme aspect ratio text.In this thesis,The research scenario is text detection and recognition in natural street view.To address the above problems,in terms of detection,this thesis proposes a text detection method that incorporates a multiscale module and a text detection method that incorporates an attention module,and improves the recognition effect of text recognition by improving the detection effect.In terms of recognition,this paper improves the ASTER text recognition algorithm,which improves the recognition effect of text while increasing the limited amount of computation and number of parameters.The main contents are as follows:(1)In this thesis,Resnet-50 is used as the backbone network,and for the problem of insufficient features extracted by feature pyramid(FPN)and small receptive field,the receptive field module(RFB)is embedded into the FPN module to expand the receptive field,capture the feature information of medium-length text,and enhance feature extraction and fusion.In order to improve the feature map characterization effect,the polarized self-attention module(PSA)is embedded in the RFB module to process the features after down-sampling fusion and adding the input feature map with excessive noise information.In order to improve the robustness of the detection method,a strip pooling module(SPM)is introduced in the feature fusion module to capture the dependencies between longer distances to address the uncertainty of feature distribution and the poor effect of fusing features at long distances.(2)In this paper,we design a detection model for dense small-scale text.Co T-Res Net is used as the backbone network to make full use of the rich context as much as possible when extracting features,while the FPN is fused with the subsequent improved feature fusion module to make full use of the extracted image features and filter the redundant noise information.The improved feature fusion module effectively extracts the location information in the low-level feature map by spatial attention,while the feature map in the middle-level feature map is rich in both location and semantic information,and the module adopts a dual-attention mechanism for the feature map in the middlelevel feature map to ensure that not only the semantic information but also part of the spatial location information can be retained during the feature fusion process,and the useless information can be removed to improve the efficiency of text detection.(3)In order to reduce the recognition difficulty of irregular text,this paper adopts a multiobjective rectified network to transform the image,make the curved text image more regular,and realize the rectification of the curved text image.To address the problems of large computation and numerous parameters in the feature extraction process of ASTER,the original 3*3 convolution in the feature extraction network is replaced by a depth-separable convolution,which effectively reduces the number of parameters and the computation of the model,and the original bi-directional long and short-term memory network is improved into a bi-directional gated recurrent unit in the encoding process,which improves the model operation efficiency and text recognition accuracy and reduces the computation and the risk of overfitting. |