| As an information carrier in daily life,text in natural scenes usually contains rich and precise information.Therefore,text detection and recognition in natural scene images has a broad scope of application and high commercial value in computer vision-based services.In recent years,the emergence of deep learning has provided new ideas for computer vision,and has realized breakthrough achievements in extensive basic tasks.Due to the high complexity of scene text in layout,size,font,image quality,etc.,it becomes gradually difficult to accurately locate the texts.Though existing works have made great progress in complex scene text detection,they are still facing more challenges: the first is how to further improve the detection accuracy for reducing error detection and omission detection;the second is how to accyrately detect the text boundary;the third is how to simplify the model and improve the detection efficiency.To overcome these problems,this thesis explores various deep learning algorithms,and conducts a series of researches on scene text detection from the aspects of model feature extraction and fusion,remote information acquisition,and lightweighting:(1)Due to the problems of background noise interference and omission detection of small-sized text,a scene text detection model(AFFE-Net)based on attention feature fusion and enhancement is designed.First,the attention mechanism is introduced to effectively enhance the information representation ability of features at different levels through extracting the details and global information in the decoding stage.Secondly,it models the relationship between the spliced features in the channel position and the spatial position before the detection head,and generates a joint feature weight maskfor feature weighting,which aims at throughly eliminating the negative impact of background noise on text detectionand effectively reducing error inspection and missed inspection;(2)Due to the problems caused by the integrity of text edge detection,a text detection model based on multi-scale joint prediction(MFJP-Net)is derived.First,the receptive field is enlarged by using the atrous convolutional feature pyramid for multi-scale cascaded features so as to mine more long-range information and accurately segment the edges.Second,dual detection heads are employed to obtain multi-scale prediction results and fuse them.In addition,the Dice loss function is leveraged in training process to alleviate the problem that the model is biased towards the background when the positive and negative samples of the training data are not balanced,which can effectively improve the model performance and accelerating the model convergence;(3)Due to the problems caused by the large amount of model parameters,prolonged computation time,and insufficient feature extraction capability of the lightweight backbone network,a lightweight text detection model(BFPF-Net)based on global guidance bidirectional feature fusion is proposed.First,it fully exploits bidirectional feature fusion to increases convolutional layers,which can improve the feature extraction capability of the network.Second,a global semantic guidance branch is designed to supplement spatial and semantic information in the feature fusion process for improving the information richness of the fused features.In addition,the depthwise separable convolution is employed to replace part of the ordinary convolution,comprehensively simplifying the model complexity and reducing the model parameters.Experimental results on the irregular text dataset Total-Text and the multi-directional text dataset ICDAR-2015 both demonstrate the excellent performances of the proposed model.Compared with the traditional methods,the precision P,the recall rate R,the comprehensive index F1,the detection speed,and the edge detection integrity are all improved to a certain extent. |