Scene text detection is currently a popular computer vision task,aiming at locating the text area in the image and helping people understand the image faster.Many industries apply this technology to implement their products,such as autonomous driving and financial analysis.At present,there are still many difficulties in scene text detection,such as large size differences of texts in images,wide dispersion,font diversity and so on.Attempting to resolve these problems,this paper proposes three scene text detectors to significantly increase the detection results of scene text detection.The contents of three scene text detectors are as follows:1.Scene text detection algorithm based on cross-layer attention gate.The traditional feature fusion methods tend to make the differences between feature maps at different scales larger,inhibit the response values of some regions in the feature maps,and harm the characterization ability of the network.Besides,the problem of scattered text distribution also brings challenges to the task of scene text detection.A cross-layer attention gate module is proposed to fuse features with different scales better by integrating features at different levels,so as to improve the comprehensiveness of text features.In addition,in order to alleviate the problem of scattered text distribution in image,this paper adopts the position perception module on the feature map,emphasizing the regions that are highly responsive to text features.2.Scene text detection algorithm with multi-layer weight fusion and dual-modal perception.For the sake of alleviating the bad results caused by scattered text distribution,this paper introduces a dual-modal perception module to perceive the information of text area of feature from local and global perspectives.To solve the problem of large differences in text scale,this paper proposes a multi-level weight fusion module to generate feature maps and weights with different receptive fields for the deepest feature maps of the network,so that the obtained feature maps can have richer receptive fields and capture text features of different scales.In addition,this paper proposes the foreground background enhancement branch to alleviate background pixel misjudgments caused by the complex background in the image,,which supposes the misjudgment of the background area while strengthening the supervision of the text area,so as to improve the detection accuracy.3.Scene text detection algorithm with multi-dimensional feature fusion and instance-wise loss.The scale gaps between texts in scene text detection images are very large.When use the typical loss function to calculate the proportion of small text loss,small texts will be missed during detection.This paper proposes an instance-wise loss function to avoid this phenomenon,which assigns each pixel a weight inversely proportional to the text instance area,so as to improve the detection results of small texts.This paper proposes a boundary refinement branch to generate more accurate boundaries,which strengthens the supervision of boundary pixels.A multi-dimensional feature fusion module is used to alleviate the problem of scattered text distribution by extracting the spatial and location information of the text better,capture features from the perspectives of width,height and dimension,and better emphasize the important features of the text.Sufficient experiments are performed on three common scene text detection datasets in this paper,and the experimental results have great advantages compared with the state-of-the-art methods,and this paper perform sufficient experiments about different modules to verifies the validity of these modules. |