Font Size: a A A

Research On Multi-oriented Scene Text Detection And Recognition Based On Deep Neural Network

Posted on:2020-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:G F HouFull Text:PDF
GTID:2428330602452529Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With rich semantic information,text is one of the important media for information transmission,text in images can help people understand scenes better.Scene text images usually suffered in complicated background,blurring,insufficient illumination and perspective deformation.In addition,scene text are diverse in colors,fonts,aspect ratios,orientations,languages and so on.The above problems make it difficult to detect and recognize text in natural scene.In recent years,especially in the deep learning era,scene text detection and recognition has become a research hotspot in computer vision.In this thesis,in view of the difficulty of text detection and recognition in natural scenes,following deep neural network,a rotation-aware text proposal network(RTPN)is proposed for multi-scaled,multi-oriented and multi-lingual scene text detection.On the basis of scene text detection,scene text recognition is studied,and residual recurrent network is proposed for multi-lingual scene text recognition.The details are as follows:(1)Inspired by Faster R-CNN,a rotation-aware text proposal network is proposed for multioriented scene text detection.First,Res Net-101 is used as backbone for feature extraction.Then,a rotational anchor mechanism is designed to generate multi-oriented proposals and detect multi-oriented text regions.Next,a rotational ROI Align pooling layer is introduced to obtain fixed-size feature vectors from multi-oriented text proposals and feature maps from previous convolutional layers.Finally,an improved NMS is used to eliminate redundant candidate boxes and get the final text bounding box.The method can simultaneously detect multi-oriented,multi-scaled and multi-lingual scene text in both high accuracy and efficiency.The proposed method gets 0.88,0.84,0.83 and 0.61 of F-measure on 4 standard benchmarks ICDAR 2013,ICDAR 2015,MSRA-TD500 and RCTW-17,respectively.The experimental results verify the effectiveness of the method.(2)A residual recurrent neural network is proposed for multi-lingual scene text recognition.The whole network consists of two stages: encoding stage and decoding stage.In the encoding stage,first,Res Net is used to extract the feature of the input image.After Res Net,in order to extract the contextual information of the text,hierarchical bidirectional long-short term memory(BLSTM)are used to obtain character feature sequence.Then,the encoded character feature sequences are fed into decoding network.In the decoding stage,attention mechanism is introduced to handle arbitrary-length character sequence.With attention,the proposed network can learn global information of character sequences and the recognition accuracy is improved.In order to avoid gradient vanishing and gradient exploding of traditional RNN,the traditional RNN unit is replaced by GRU(Gate Recurrent Unit).Using GRU as the decoding network can reduce parameters in the network,thus the training speed is accelerated.The proposed method can effectively recognize multi-language scene text.For English text recognition,IIIT5 K,SVT,ICDAR 2013 and ICDAR 2015 are used to evaluate,the proposed method gets 0.825,0.863,0.912 and 0.723 of recognition accuracy on above 4 datasets,respectively.For Chinese and English mixed text recognition,MSRATD500,RCTW-17 and self-made datasets are used to evaluate,and the experimental results verify the effectiveness of the proposed method.In addition,the recognition network is combined with RTPN for end-to-end scene text recognition,which can effectively recognize multi-lingual(English and Chinese)scene text.
Keywords/Search Tags:Deep Neural Network, Multi-oriented, Scene Text Detection, Multi-lingual, Scene Text Recognition
PDF Full Text Request
Related items