| Text detection in natural scene is an important research field in computer vision.As a previous step of character recognition,it directly affects the accuracy of character recognition.At present,horizontal or near-horizontal text detection algorithms in natural scenes have achieved good results,but the problem of multi-directional text detection has not been well solved.To solve this problem,this paper proposes two multi-directional text detection models in natural scenes.In order to improve the accuracy of text detection,the deep supervision mechanism is applied to the convolutional neural network and the features of multiple stages in the convolutional neural network are fused when the text area is predicted.The main work is as follows:(1)A deep supervised text detection model based on regression prediction is proposed.The model predicts pixels in the feature map is text or not,and regression predicts the geometric features of the corresponding pixel to detect the text region.The deep supervised mechanism is applied in the detection model.By extracting the features extracted from the convolution layer at different stages,the text region in the image is predicted according to the upsampling result,so that the convolution layer can adjust the parameters according to multiple prediction losses.When predicts pixel is text or not,the classification result is determined by predicting the probability that a pixel belongs to a text region and the probability of belonging to a non-text region,and comparing the magnitudes of the two values.This approach eliminates the need to manually set thresholds to make predictions more accurate.The text detection model has increased by 2.24% and 2.28% in the two data sets of ICDAR2015 and MSRATD500 respectively than the EAST algorithm in F-Score.(2)A deep supervised text detection model based on semantic segmentation is proposed.The model classifies the feature map pixels for text/non-text,and predicts the connection relationship between the pixel and the surrounding pixels.The model detects the multi-directional text regions by clustering the pixels belonging to the same text region.This semantic segmentation model also applies a deep supervised mechanism and combines multi-stage features in predicting output to make predictions more accurate.The model was trained with the images in ICDAR2015,which was higher than PixelLink,the previous detection algorithms,in terms of recall rate,accuracy and F-Score.Both of these models adopted the deep supervised mechanism and fused features of multi-stage when training the model.On the one hand,deep supervised mechanism upsamples the feature map extracted from multiple stages of the convolutional neural network to predict the text area in the image and calculated the prediction loss.In this way,the parameters of the convolutional neural network can be adjusted according to the prediction loss in multiple stages,which provides more feedback information for each convolutional layer in the convolutional neural network.On the other hand,in order to improve the accuracy of text detection,the fusion of multi-stage features will fuse the feature map with more global and abstract information in the high convolutional layer and the feature map with rich spatial location information in the low convolutional layer.The experimental results show that the deep supervised mechanism and the method of fuse multi-stage features can improve the performance of both text detection models. |