Font Size: a A A

Research On Text Detection Method In Natural Scene Image

Posted on:2022-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:W L LuoFull Text:PDF
GTID:2518306527483014Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Aiming at the problems and deficiencies of text detection in natural images,this paper studies text detection in natural images.In this paper,an end-to-end text detection framework based on convolutional neural network is designed to improve the accuracy of text detection and reduce the complexity of detection model.The specific research contents of this paper are as follows:1.Text detection method based on adaptive feature selection and scale-aware loss function.The texts in daily life are very diverse and there are many messy backgrounds.In order to solve these problems,this paper proposes a neural network with adaptive feature selection.This network consists of two parts.The first part is to use Res Net50 as our basic network framework to extract features.From the conv3?x to conv5?x steps of the network,we replace the traditional convolution with deformable convolution.The second part is composed of a feature pyramid network and an adaptive feature selection module.This design is conducive to curved text detection and background information filtering.Since the scale of the text in the image is closely related to the height characteristics of the text,this paper proposes a scalerelated loss function.This design helps reduce the missed detection of small texts.Our experimental results on four public data sets show that this method greatly improves the accuracy of text detection when only a few parameters are added.2.Text detection method based on multi-level feature fusion and attention mechanism.The variability of scale and the diversity of text distribution caused by the camera's shooting angle of view and the size of the text itself still brings challenges to text detection.This paper proposes a neural network with multi-level feature fusion and attention mechanism to detect text.The network is mainly composed of two parts: the first part uses the Res Net network as the basic backbone network,sends the features of the conv4?x layer to the Dilated Convolutional Pooling Network(DCPN),then concatenate the features obtained by the DCPN module with the features obtained by the feature pyramid network.The second part is the attention mechanism module,followed by the attention mechanism after the features obtained in the previous part.This part replaces the fully connected layer on the basis of channel attention,and use a convolution to achieve information exchange between different channels.Consider the information exchange between the current channel and its k neighboring channels to enhance the ability of the neural network to extract features.The experimental on ICDAR2015,TD500,CTW1500,and Total-Text datasets show that this method can significantly improve the accuracy of text detection in natural scenes images,and is better than the state-of-the-art methods to a certain extent.3.Text detection method based on cascade feature fusion.Complexity and accuracy have always restricted each other.Generally speaking,the higher the accuracy,the higher the complexity.This brings certain difficulties to the application of text detection.This paper proposes a text detection network with low complexity and high accuracy.The network is mainly composed of feature extraction and two cascaded feature enhancement modules.The first part is feature extraction.We use Res Net to extract basic features.The attention mechanism is integrated into multiple residual block units,and the features of adjacent channels can be fully interacted many times with a very small amount of calculation.And after obtaining multilevel features,feature dimensionality reduction is the next step to reduce the amount of calculation.The second part is the cascading feature fusion part,in which the "U"-shaped network structure(FPEM)is used to enhance the features from top to bottom and bottom to top.Moreover,the FPEM module has very few parameters,so it can be cascaded to fully integrate features.Finally,the feature maps of different scales obtained by FPEM are combined by element-by-element addition.The results on four public data sets show that this method reduces the number of parameters of the model while the accuracy is improved.
Keywords/Search Tags:Text detection, Multi-scale features, Attention mechanism, Loss function, Deformable convolution
PDF Full Text Request
Related items