Font Size: a A A

Research On Scene Text Detection Technology Based On Multi-Scale Information Fusion

Posted on:2023-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z P YuFull Text:PDF
GTID:2558306845991289Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology,image data is growing explosively.As one of the means to extract text information from images,scene text detection has been widely used in image retrieval,automatic driving,visual question answering and blind navigation.Because the accuracy of scene text detection is easily affected by occlusion,blur,uneven illumination and complex background,it has been widely concerned by researchers.Scholars mainly used text box regression to detect scene text early.This method is limited by the shape of text box,and generally cannot detect text with arbitrary shape.In recent years,some scholars have proposed novel scene text detection methods based on segmentation technology.Irregular text is detected by classifying pixels into text region and non-text region,and satisfactory results have been obtained.However,these technologies still have some shortcomings.The first is the lack of feature expression ability of the model,especially when there are blurring,low resolution,occlusion and uneven illumination in the image,it is easy to miss detection and false detection.Secondly,only convolution operation is used in feature extraction,which leads to the lack of global semantic information guidance in segmentation operation,which affects the detection effect of the model.To solve the above problems,this thesis studies the scene text detection algorithm.The main contents are as follows:(1)Aiming at the problems of occlusion,blur and low resolution in scene text images,an enhanced feature pyramid network integrating multi-scale features is proposed.Firstly,multi-scale features are extracted and fused by using multi branch dilation convolution module and feature pyramid network respectively to realize the separation of multi-scale feature extraction and fusion.At the same time,considering the real-time problem,separable convolution is fused in the model to reduce parameters and improve the detection speed of the model.Secondly,the feature pyramid network is improved by adding the multi-scale dual attention mechanism module,which adaptively enhances the useful features and weakens the useless features in the spatial and channel dimensions respectively,so as to improve the feature expression ability of the model.The experimental results on public datasets show that the enhanced feature pyramid network integrating multi-scale features can improve the detection effect of the model.At the same time,when occlusion,blur and uneven illumination occur,the false detection and missing detection rate is significantly reduced.On the Total-text dataset,the F value of the proposed method is increased by about 1% compared with the benchmark method.(2)Aiming at the complex background of scene text image,a segmentation network integrating transformer encoder is proposed.By adding transformer encoder module,the global self-attention mechanism is introduced into the segmentation network,so that the overall features extracted by the model contain global semantic information,so as to guide the segmentation operation and improve the detection effect of the model.The results show that when using Res Net-18 network,the F value of the proposed method is increased by 1.20%,0.80% and 1.50% respectively compared with the benchmark method;When Res Net-50 is used,the F value of the method proposed in this thesis reaches 88.10%,85.40% and 86.00% respectively,which is better than the method in recent years.
Keywords/Search Tags:Scene text detection, Instance segmentation, Dilation convolution, Attention mechanism
PDF Full Text Request
Related items