| Scene Text Detection aims to detect text regions in images and localize them with bounding boxes,which is a hot topic in the field of computer vision.Recently,deep learning technology has greatly promoted the development of many visual tasks including scene text detection.However,the variations of scene texts in size,orientations,aspect ratios,and complicated background,make scene text detection a challenging task.Despite the good performance achieved by the scene text detection algorithms based on deep learning,the following problems are still the key challenges that restrict the task of scene text detection:(1)Employing a top-down path to merge multi-scale features,but ignoring the issue that the semantic information is diluted in this process;(2)Insufficient discriminative of network cause too many error detection results;(3)The insufficient feature expression ability caused by oversimple interpolation up-sampling methods.In order to alleviate the above problems,two natural scene text detection algorithms based on deep learning are proposed.(1)A text detector named Semantic-compensated and Attention-guided Network(SANet)is proposed to alleviate the problems of semantic information diluted and false detections.It contains a Semantic Compensation Module(SCM)and a Text Attention Module(TAM).Specifically,SCM compensates the high-level semantic information directly into the features at all levels of a top-down path via a series of semantic flows,which can alleviate the dilution of semantic information.In addition,TAM is adopted to encode strong supervision information into convolutional features,which significantly enhances text-related features,thereby enhancing the discriminative ability of the network to reduce the number of false detections.The proposed SANet is evaluated on three commonly used multi-directional scene text data sets(ICDAR 2015,ICDAR2017-MLT and MSRA-TD500),and the quantitative results prove the effectiveness of SANet.(2)Aiming at the problem of insufficient feature expression ability,an end-toend training scene text detection model is proposed.The proposed Back Projection Enhanced Up-sampling(BPEU)module alleviates the drawback of sample interpolation algorithms.It significantly enhances the quality of up-sampled features by employing back projection and detail compensation.Furthermore,the proposed MultiDimensional Attention(MDA)module can adaptively extract more expressive features from space and channel dimensions.Similarly,the proposed method is evaluated on common data sets,and the experimental results show the effectiveness of the proposed method. |