Font Size: a A A

Research On Key Technologies Of Multi-oriented And Arbitrary-shaped Scene Text Detection

Posted on:2021-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y XiaoFull Text:PDF
GTID:2428330647951060Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the recent years,with the emergence of deep learning represented by Convolutional Neural Network(CNN)and Recurrent Neural Network(RNN),the research of scene text detection has made new developments.However,due to the existence of the following two factors,scene text detection is still a very challenging task.First,images in natural scene often have complex backgrounds,which can easily interfere with the detection process.Second,the forms of text in natural scenes are very diverse.Horizontal text and inclined text,straight text and curved text may exist in a scene image at the same time.In order to better solve the problem of multi-oriented and arbitrary-shaped scene text detection,this thesis studies the key technologies of this problem based on Mask R-CNN and proposes two algorithms.The main contents of this thesis are as follows:(1)In view of the problem that text-like objects in the backgrounds of scene images are easily misclassified as text,this thesis proposes a scene text detection algorithm that combines attention mechanism and instance segmentation.Based on Mask R-CNN,a new attention mechanism module Text-context-aware Attention Module(TCAM)is proposed.In the network architecture of this algorithm,TCAM is connected to each level of the feature pyramid of original Mask R-CNN.TCAM utilizes channel attention mechanism and spatial attention mechanism at the same time,and combines these two forms of attention mechanism by addition.TCAM can effectively suppress the false positive detection boxes produced by text-like objects in the background,thus improving the detection performance.The proposed algorithm has achieved F-measure of 84.60% and 70.20% on ICDAR2015 and ICDAR2017-MLT datasets,respectively.(2)In order to better deal with the variance of scale of scene text,this thesis further proposes a scene text detection algorithm based on multi-level featutre fusion.Based on Mask R-CNN,Pyramid Feature Fusion Module and Multi-layer Ro I Future Fusion Module are proposed to improve the construction and utilization methods of feature pyramid in original Mask R-CNN to improve the algorithm's capability of dealing with the variance of scale of text.Pyramid Feature Fusion Module uses both top-down and bottom-up feature fusion paths,so that the information of shallow feature and deep feature is fully exchanged and fused.This module simultaneously enhances the expression capabilities of shallow feature for detecting small text and deep feature for detecting large text,thus improving the detection performance of both small text and large text.Multi-layer Ro I Feature Fusion Module combines all levels of feature maps in feature pyramid to extract the features for prediction for text candidate regions,which enables feature extracted to better highlight the local and global characteristics of text instances,thus further improving the overall detection performance of the algorithm.Finally,this algorithm utilizes deformable convolution in its backbone network,which further enhances it's capability of dealing with the variance of scale of text.The algorithm proposed has achieved F-measure of 93.01%,87.80%,76.39% and 84.15% on ICDAR2013,ICDAR2015,ICDAR2017-MLT and SCUT-CTW1500 datasets,respectively.
Keywords/Search Tags:scene text detection, convolutional neural network, attention mechanism, featue fusion, deformable convolution
PDF Full Text Request
Related items