Font Size: a A A

Research On Text Detection And Recognition Algorithm Based On Deep Learning In Natural Scenes

Posted on:2024-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:K LiFull Text:PDF
GTID:2568307100489034Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Since ancient times,writing has always been an important means of recording information,exchanging ideas,and spreading culture.With the advent of the information age,a large amount of text information is rapidly disseminated in the form of images,making text detection and recognition technology in natural scenes a hot topic at present.However,the randomness of the shape,size,and direction of text in natural scenes,as well as background noise and shooting angles,make detection and recognition less effective.In recent years,with the development of deep learning,more and more models have been applied in this field.In view of the shortcomings of the current text detection and recognition algorithms in natural scenes,this paper proposes a text detection algorithm based on YOLOv5 improvement to improve the detection effect of text in complex scenes.In terms of text recognition,this paper proposes a text recognition algorithm based on CNN and self-attention mechanism to improve the recognition accuracy of irregular and variable-length text.The main contributions of this paper are as follows:(1)In terms of text detection,YOLOv5 s is selected as the main network.Considering that many redundant similar feature maps are generated in the calculation process of the CSP1_X module in the YOLOv5 s backbone network,which reduces the detection speed of the model,this paper uses the designed lightweight CSP_B module to replace the CSP1_X module,reducing the model’s computation and parameter volume.Experiments have shown that this improvement significantly increases the detection speed of the model.(2)In order to make the detection model better focus on extracting text features,the GAM attention mechanism module is added to the YOLOv5 s backbone network to retain more dimensional information.At the same time,considering the problem of relative position information loss in the G-IOU loss function leading to slow convergence speed and low efficiency,this paper uses the S-IOU loss function to replace the G-IOU loss function to improve the robustness of the detection model.Through ablation experiments,it is proven that the above improvements effectively improve the accuracy and recall rate of the detection model.(3)In terms of text recognition,the recognition model is divided into three parts:feature extraction module,encoder,and decoder.In order to better extract information features of irregular text in complex natural scenes,this paper selects the lightweight Conv Ne Xt as the text feature extraction module.In order to better recognize variablelength text,Transformer is used as the encoder to reduce the loss of temporal features and improve the accuracy of the model by using its self-attention mechanism and parallel processing capability.At the same time,in order to improve the accuracy of the prediction results,the decoder part of this paper uses a bidirectional parallel Transformer decoder for text prediction.Experiments have shown that the proposed model can effectively improve the accuracy of text recognition.
Keywords/Search Tags:deep learning, attention mechanism, text detection, text recognition, Transformer
PDF Full Text Request
Related items