Font Size: a A A

Oriented Scene Text Detection With Deep Neural Network

Posted on:2019-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:X D YangFull Text:PDF
GTID:2428330545495407Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Words are high-level visual elements which contain rich semantic information.Mining textual information in visual tasks through visual tasks such as text detection or recognition is of great significance for image search,such as automatic search,automatic translation,and human-computer interaction.Many related researches about text detection in complex scenes have been performed and related methods are mainly divided into two categories:one is a text detection method based on traditional methods,and the other is a text detection method based on deep learning.However,existing methods are mostly concentrated on horizontal text detection,and the detection of oriented scene text detection is still challenging.In addition,the accuracy and speed of existing methods are unsatisfactory,and it' s hard to adapt to various changes in text with complex scenes,such as blurring,low resolution,occlusion,and so on.This article would focus on solving these issues and the content of research and major contributions include the following aspects:Firstly,a deep neural network architecture based on the combination of deformable convolution and feature pyramid convolution is designed for scene text detection.We mainly improve the text detection framework SegLink proposed by Bai xiang et al.This article uses the idea of feature pyramid network(FPN)to upsample the shallow feature map with a 3x3 convolution,then use 1×1 convolution to change the channel number,and then merge it with the upsampled feature map.It makes the framework more expressive and the experimental results show its superiority.The receptive field corresponding to the convolution layer used in the common detection framework is generally a square rectangular box whose shape is unchanged.However,the objects in different scenes have different shapes and thus have limitations,especial for the text with complex scene whose shape is an irregular quadrilateral.We then added a Deformable convolutional layer to the SegLink network so that its receptive fields can be arbitray and the model' s capacity is enhanced and the results have been improved remarkably.In addition,this article draws on the idea of Mask-RCNN and adds a layer of mask information to the SegLink framework.It introduces the Mask into the network and makes it participate in training,making the network have a better supervising.Secondly,another deep neural network architecture based on residual network and focal loss is designed for scene text detection.We mainly improve the oriented text detection framework EAST which can perform intensive prediction like segmantic segmentation.Instead of using PVANet as original framework,we use ResNet-50,combining RefineNet and other networks,and integrating features of various layers in various ways to increase the capacity of the network.In addition,this article will replace original balance loss of EAST with FocalLoss.The ground truth of EAST is pixel-level,most of which in the text are positive samples.Pixels outside the text are negative samples.Therefore,the proportion of positive and negative samples is seriously out of balance.Original EAST adds weights to positive and negative samples separately to balance the ratio.However,there is no distinction between easy example and hard example.There may be cases where the gradient is occupied by easy example.Therefore,we increase the weight of hard example and reduces the contribution of easy example in loss,which makes the training more stable.Finally,we expand the feature map of output to obtain a better result.
Keywords/Search Tags:Multi-oriented Text Detection, Deep Detection Network, EAST
PDF Full Text Request
Related items