Font Size: a A A

Research On Scene Text Detection Algorithm Combining Dual Attention Mechanism And Dilated Convolution

Posted on:2022-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:K L DuFull Text:PDF
GTID:2518306563463204Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Scene text detection is a very important research topic in computer vision.Scene text is an important information carrier,and its detection technology has been widely used in image/video understanding,visual search,product identification,automatic driving,target positioning and other fields.Therefore,the research of scene text detection technology has been widely concerned by many scholars.In the early stage,most researchers chose to use textbox regression for text detection.This method has high detection efficiency,but is limited to rectangular text detection.Recently,many scholars choose to use the network based on pixel level segmentation to detect irregular text,and have achieved a great many effective results,but there are still some problems to be solved.First of all,the network cannot guarantee a high detection speed while improving the accuracy of text detection,so it is difficult to be applied in actual scenes.Secondly,text blurring,distortion and complex background interference often occur in the scene,which may lead to missed or false detection.Finally,the convolution operation of the network will lead to the loss of spatial information,and the segmentation operation lacks the guidance of context information,thus reducing the detection performance.Aiming at the above problems,this paper studies how to build a more efficient and accurate scene text detection model,and the main achievements are as follows:(1)A bidirectional feature pyramid network with dual attention mechanism is proposed,which can effectively improve the accuracy of text detection while ensuring the detection speed.Firstly,this paper uses a weighted bidirectional feature pyramid network to improve the existing feature pyramid network.The improved network structure enables the model to apply multi-scale feature fusion several times and maintain real-time detection.Separable convolution reduces computational parameters and effectively improve computational efficiency.The fast normalization method can learn the importance of input features of different layers and effectively improve the representation and resolution of features.Then,aiming at the limitation of the lack of context information guidance in segmentation operation,two attention mechanisms were used to increase the context relationship to enrich the spatial information and further improve the feature representation ability of the model,thus improving the detection accuracy.The experimental results show that the bidirectional feature pyramid network combined with the dual attention mechanism can not only better detect the irregular text with different directions or rotation distortion,but also better solve the difficult problems caused by blur,occlusion,illumination,etc.On ICDAR2015 dataset,the proposed model is 1.09% higher than the F-Measure value of the benchmark method on the premise of guaranteeing the detection speed.(2)A segmented network with dilated convolution is proposed.Firstly,to solve the problem that multi-layer convolution operation will lead to the separation of detailed spatial information,the model adds dilated convolution into the segmentation network,which can effectively increase the receptive field without losing the resolution.In addition,hybrid dilated convolution is used to obtain multi-scale information and reduce spatial information loss by setting different dilated rates at different layers.Through a lot of parameter optimization experiments,the parameters of the model are determined.The experimental results show that the segmentation network with dilated convolution has fewer model parameters and can effectively improve the detection accuracy by expanding the receptive field.When the lightweight backbone network Res Net-18 is used,the FMeasure values on ICDAR2015,Total-text and TD500 public datasets are increased by1.22%,1.19% and 2.14%,respectively,compared with the benchmark algorithm.
Keywords/Search Tags:Text detection, Natural scene image, Semantic segmentation, Attention mechanism, Dilated convolution
PDF Full Text Request
Related items