Font Size: a A A

Research On Scene Text Detection Based On Deep Learning

Posted on:2019-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:M Y EnFull Text:PDF
GTID:2428330593950238Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Text in natural scene images is an important source of information,containing rich and precise high level semantics.So detecting and recognizing scene text have great application value and have attracted much research interests during the last two decades.Early detection and recognition methods are based on artificially designed text features.However,with the revival of deep learning,deep neural networks show strong ability of learning features.Research based on deep neural networks,especially convolutional neural networks has became the mainstream of this field.Against the backdrop,the main task of this paper is to study the problem of scene text detection based on deep convolutional networks.In order to solve the problem of multi-scale scene text detection,especially small text detection,we propose a new detection framework called feature pyramid based scene text detector.The framework is based on the state-of-the-art object detection framework SSD,and introduces feature pyramid mechanism.Through a top-down feature fusion manner,features from different depth in CNN are combined and new features are built,forming a feature pyramid in which features have both high-level semantics and fine local details.Detecting on the new built features improves the performance on multi-scale text detection and small text detection.On ICDAR2013 benchmark,the F-score of the proposed method achieves 87.6%.Most of the current state-of-the-art scene text detection methods need a large amount of data with bounding box-level or pixel-level ground-truth to train deep models.But getting these kinds of data require expensive manual annotation.We explore to propose a weakly supervised method that train a deep CNN model with text localization ability on datasets that have only image-level annotations.Given an input image,the model is capable of producing a 2-D class activation map(CAM)where value of each pixel denotes the confidence score of whether the pixel belongs to text region or not.By the help of the CAM,most of background areas in the input image can be filtered out and then we find the areas where text may exist.Based on this method,we can generate text proposals by some MSER-based methods.The proposed weakly supervised method achieves recall rate comparable to some fully supervised methods on ICDAR2013 and ICDAR2015 benchmarks.
Keywords/Search Tags:scene text, convolutional neural netwotks, weak supervision, deep learning
PDF Full Text Request
Related items