Font Size: a A A

Scene Text Detection Algorithm Based On Anchor-free Deep Learning Network

Posted on:2023-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:M Y NiFull Text:PDF
GTID:2568306836469144Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
The text in natural scene image contains rich semantic information.Accurate extraction of text information plays an important role in promoting machine scene understanding,artificial intelligence development,industrial automation production and other industries.Therefore,the research of scene text detection technology is very important.With the rapid development of deep learning,text detection methods based on deep learning also emerge in an endless stream.Unlike general objects,text in scene images not only has more scale,but also is easily disturbed by complex backgrounds.Scene text detection algorithms based on deep learning can be roughly divided into three categories: text detection algorithms based on regression,segmentation and regression and segmentation mixture.According to whether anchor frame is used or not,regression based algorithm can be divided into two methods: anchor based and anchor free.In recent years,text detection algorithm based on anchor-free has attracted much attention due to its simple and elegant network structure.This paper mainly focuses on the scene-text detection algorithm based on Anchor-free,and the main research contents are as follows.(1)An improved EAST algorithm based on residual structure is proposed..The residual structure is introduced on the basis of EAST algorithm,and several residual modules are added after each convolution block,which expands the receptive field by increasing the network depth and solves the problem of gradient disappearance.Secondly,the loss function is improved by adding the distance between the prediction text box and the center point of the real text box into the loss function as a penalty term,which solves the problem that the gradient cannot be sent back when the prediction box and the text box do not intersect with the traditional Io U loss.The algorithm was tested on ICDAR2015 and MSRA-TD500 datasets.Compared with EAST algorithm,the detection accuracy is significantly improved.(2)An improved EAST algorithm based on receptive field module(RFB)and stroke width transformation(SWT)is proposed.Two innovations were made on the basis of EAST.First,inspired by the receptive domain of the human visual system,the RFB module is formed by combining conventional convolution with empty convolution of different expansion rates and splicing the channels.An RFB module with stride of 2 was used to replace the last convolution layer and pooling layer of each stage of feature extraction layer to enhance the stability of feature description,and another RFB module was added at the last stage to expand the receptive field.In addition,a SWT stage is added after the non-maximum Suppression stage,which expands the prediction text box to both sides according to certain rules,transforms the stroke width,and determines whether there is text information in the expanded area according to conditions,so as to fill the full-length text.Experiments on ICDAR2017 RCTW and MSRA-TD500 data sets show that the algorithm not only increases the accuracy of text location,but also greatly improves the detection effect of long text.(3)A Corner Net based scene text detection algorithm is proposed,which uses the center coordinate containing location information instead of embedding vector to match key points in the upper left corner and lower right corner.The algorithm locates the text box by detecting a pair of key points in the upper left and lower right corner.For each key point,a vector pointing to the center point of the target text is predicted from its position,and a center point is generated according to the vector.If the predicted center points of two key points are similar and both are in the center area of the prediction box,the two key points are matched successfully.In addition,centripetal vector loss is added to replace the original push-pull loss in the loss function.The algorithm was compared with the Corner Net algorithm on ICDAR2015 data set,and the accuracy is improved significantly.
Keywords/Search Tags:text detection, convolutional neural network, anchor, residual structure, receptive field module
PDF Full Text Request
Related items