Font Size: a A A

Research On Deep Learning Based Natural Scene Text Detection And Localization

Posted on:2019-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:C LuoFull Text:PDF
GTID:2428330566486884Subject:Engineering
Abstract/Summary:PDF Full Text Request
Scene text is one of the most common visual object in natural scene images,and it appears frequently on natural scenes such as road signs,product packages,etc.Text contains rich textual information,thus extracting and understanding textual information embodied in natural scenes have increasingly become a hot topic in recent research.Text detection,usually as a first step in text reading systems,which aims to localize text regions with bounding boxes of words,plays a critical role in the whole procedure of text information extraction and understanding.Inspired by rapid development of deep learning,various text detection techniques based on deep learning technology are getting popular.While text in image can be regarded as a special kind of object,it's easy to localize horizontal text target with the help of deep object detection network.However,different from general object,horizontal scene text regions tend to have more scales with arbitrary oriented features,thus may be influenced by text-like image backgroud easily.In this article,we propose an algorithm combined with deep learning technoloy for natural scene text detection and localization.In summarize,our main work and contributions are twofold below:1.Based on SSD,a recent development in object detection,we propose a text localization neural network.Through a forward propagation in the network,followed by a standard nonmaximum suppression,an input image is able to get final detection results.When used in text detection,traditional SSD model didn't perform well.However,by considering features of scene text,we make some improvement on SSD in our text localization neural network,thus more suitable for text detection task compared to original SSD:(1)We design Text Detection Layers based on SSD,which can detect single text class fast as original SSD is mainly designed for multi-class object detection.(2)In each different location of feature map,Text Detection Layers adopt two improved strategies while generating default text bounding boxes for prediction.On the one hand,based on SSD,we design more suitable aspect ratios for these default boxes in order to handle different scales of horizontal text regions in scene images.On the other hand,in case of dense text regions in input image somewhere,we make vertical offsets of default boxes in feature map cell.This small improvement is able to increase numbers of default text boxes,which may coverscene text better.(3)Inspired by design of Inception Module in SSTD,in order to get output of Text Detection Layers,we adopt 1×5 rectangular convolution kernel while sliding over multiple feature maps.By using 1×5 kernels instead of 3×3 regular ones,our network is more suitable for long text regions detection.2.In order to eliminate some scene background regions which are mistaken by our text localization network,we propose a text verification model based on the encoder-decoder structure.The encoder model is based on CNN as well as BiGRU to encode input text regions in image,thus image context feature vectors can be obtained.The decoder model uses a BiGRU network to perform feature vector sequence decoding task subsequently,we use attention mechanism in our decoding task,which is helpful to focus on our current input.At last,we construct a horizontal scene text dataset in which images are taken by camera and from RCTW 2017 dataset.Our dataset contains 12000 training images and 3000 test images and we take experiments on our constructed dataset.Experiments show on the one hand,our proposed text localization neural network,compared to original SSD model,is able to improve text detection accuracy from 75.9% to 80.5%,and from 56.8% to 78.4% on recall rate.On the other hand,the proposed text verification model makes 3.1% improvement in detection accuracy based on result of text localization neural network.Also,compared to those deep learning based text detection algorithms which take lead in ICDAR scene text localization competitions,our proposed method achieves a highest text detection recall rate of 78.4%.In accuracy,our method achieves 83.0%,only slightly lower than CTPN.As a whole,our proposed method can achieve competitive performance.
Keywords/Search Tags:text localization, object detection, SSD, encoder-decoder model
PDF Full Text Request
Related items