Image tasks in complex environments are still a huge challenge in the field of computer vision,including bad weather environment,motion blur,background clutter,poor imaging quality,etc.,and the positioning and recognition of ship plate text at ports and terminals is a classic complex environments scenario.Therefore,it is of great practical significance to study the positioning and recognition of text in complex environments.In this paper,the object detection task and the text image recognition task in complex environment are studied,and verified on multiple datasets.For object detection in complex environments,this paper focuses on the feature map offset problem in the upper and lower sampling process of feature pyramid in object detection,and proposes a non-local feature alignment fusion augment algorithm.For text image recognition tasks in complex scenarios,this paper focuses on the problems of low image imaging quality and difficult recognition,and proposes a text image super-resolution algorithm guided by semantic and structural information.The specific research content and innovation points of this paper are as follows:1、Aiming at the problem of misalignment of feature map distribution caused by loss of detail during upsampling and downsampling of feature pyramids,a feature distribution alignment method based on non-local ideas is proposed to improve the accuracy of positioning detection.In this network,the traditional non-local algorithm is first derived from single-scale to multi-scale feature space,and the features that produce bias are aligned through global feature attention.Then,a neighborhood similarity calculation algorithm is designed,which changes the attention of a single pixel to the concern of the upper and lower neighborhoods of each pixel,which improves the ability to capture the correlation dependencies between different regions and the ability to resist noise,and increases the robustness of the model.Finally,in order to make up for the lack of attention to the correlation between channel dimensions in non-local thinking,a channel fusion enhancement algorithm is designed,which uses two kinds of global pooling to aggregate the attention on the feature computing channel by compressing the information on the spatial scale of each feature map,and cooperates with the non-local alignment module to calculate complementarity,respectively paying attention to the problems of "what" and "how much",making up for the lack of channel information in non-local alignment operations.2、Aiming at the problem of low imaging quality and difficult recognition of text images in natural scenes,this paper designs a super-resolution model of text images guided by both semantic information and structural information,which effectively uses the naturally existing text semantic information and text character structure information in text images.Firstly,the model designs a text semantic interpreter that learns text semantic information.Through the prior knowledge of text,learn the semantic information in the text picture,and use the semantic information to guide the super-resolution network to perform the super-resolution task on the low-resolution text picture.Secondly,by using the visual map convolutional(Vi G)structure instead of the traditional convolutional neural network(CNN)structure and designing the context orthogonal attention module,the structural information and edge information of text images can be extracted more effectively,and the text image can be better guided to use structural information for super-resolution tasks.Finally,several specialized loss functions are designed for the text image super-resolution task,including gradient contour sharpening loss,edge guidance loss,text structure consistency loss,and text prior loss.Through the above algorithm,we have targeted use of upper semantic information and structure information,which has improved the performance of text image super-resolution tasks,improved the clarity of text images,and improved the accuracy of downstream text recognition tasks.3、In this paper,the ship plate data image of the port terminal is collected and the ship plate dataset is produced.These include the Ship Plate Positioning Detection Dataset and the Ship Plate Text Dataset.At the same time,the public dataset and the ship entry plate dataset are used to verify the feasibility of the model studied in this paper,which proves the effectiveness of the model studied in this paper on the ship license plate dataset. |