Font Size: a A A

Research On Multi-oriented Scene Text Localization And Detection Based On Multi-scale And Big Receptive Field Deep Learning Features

Posted on:2020-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:W YangFull Text:PDF
GTID:2428330590460925Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Word is one of the most important media to abstract and express the thoughts since so much plenty of semantic information words can contain.By locating and outlining the text in the scene,scene text detection is the first step towards the following text recognition and image understanding.Traditional scene text detection requires a good knowledge of character structure to extract robust features,and performs poor in generalization and robustness.By solving the scene text detection with object detection and segmentation framework under deep learning,hence to throw a new methodology to solve the scene text detection problem.Comparing to the traditional methods,deep learning method is of higher robustness and better performance.This thesis proposes a new FCN-based multi-oriented scene text detection model of larger receptive field to conquer the weakness of FCN-based text detection model in detecting text instance with high variation of area and scales,and in detecting large text instance.The main contributions of this thesis are as follows:1.Aiming to raise the performance of text detection model when challenged by some text instances of wide range of areas and scales,a multi-channel of convolution layer structure of multi-scale kernel size is proposed:(1)Adopting a deeper feature-fusion structure,which is combining the dimension-converted feature maps in lower(2nd and 3rd)conv-layer to be more specific,hence to compute a bigger and denser feature map and provide a more precise feature tensor for dense prediction;(2)Inspired by Inception of GoogleNet,a multi-convolution layer structure is designed,to extract a wider range of local information of scene text,resulting in a scene text detection model of higher performance.2.Aiming at solving the problem that the precision of text detection model facing text instance of large resolution is not satisfying,a scene text detection model of larger receptive field is designed:(1)The kernel of size 3×3 in the 6th conv-layer is dilated with dilated-rate 2,and meanwhile the 5th pooling layer and corresponding up-sample module is adjusted,based on the model in part1.And finally to enlarge the receptive field of text detection model;(2)Exploring new multi-channel convolution structures to attain an optimized model of fewer parameters,based on the model in part(1)1)Multi-channel conv-layer with dilated convolution is built:5×5 kernel in the multi-channel conv layer is dilated resulting in a 3×3 in dilated-rate of 2.Hence a simplified model is obtained;2)Using uneven multi-channel multiple kernels,which constructed by reducing the channel of 1 × 1 and 5×5convolutional kernels,resulting in the enhancement of the local information learned by the 3×3 kernels and reducing the parameters of conv-layer.All the scene text detection experiments are carried on the dataset of a Chinese scene text database called RCTW17.The multi-channel FCN scene text detection model of more feature fusion modules achieves an average 23.5%improvement in performance,as a result the effectiveness of multi-channel convolution is proven.By adopting an dilated convolution structure in 6th-conv layer,the new model achieves improvement in recall rate,Precision rate and F1-measure respectively.As a result,using dilated convolution to achieve a larger receptive field proves its effectiveness.Further experiments is carried on the optimization of a better multi-channel convolution structure:(1)Applying dilated-convolution in multi-channel convolution layer degrades recall rate,Precision rate and F1-measure,which indicates that it does no help in improving the performance of scene text detection.(2)The proposed uneven multi-channel convolution layer has a slightly degradation in performance but makes the model easier to train.And the model of the best performance in this thesis achieves 0.541?0.669,0.598 in recall rate,Precision rate and F1-measure respectively.Comparing to FTSN and Seglink,which stands for state of art scene text detection algorithm,the method proposed in the thesis achieves higher recall rate and F1-measure under the experiments under the same datasets That proves the superiority of the method proposed.
Keywords/Search Tags:multi-oriented scene text detection, fully-convolution neural network, multi-channel convolution, dilated convolution
PDF Full Text Request
Related items