Font Size: a A A

Research On Scene Text Detection And Image Classification Based On Convolutional Neural Network

Posted on:2020-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2428330596474803Subject:Electrical engineering
Abstract/Summary:PDF Full Text Request
With the popularity of intelligent mobile terminals,scene images captured by mobile devices prompt the emergence.It is rich in accurate semantic information.Correct text detection in scenes is an important cornerstone of visual understanding applications such as automatic driving,blind navigation,license plate recognition and text-based image retrieval.In the field of smart grids,the detection of warning signs on high-voltage power poles can enhance the understanding of the surrounding environment during the inspection of drones.Therefore,it is necessary to detect the position of text in the image and extract information.While the traditional machine learning method was inefficient with poor generalization ability,we proposed an end to end scene text detection algorithm based on convolutional neural network(CNN).Firstly,based on deep residual network architecture(Res-Net).Then,we used the U-Net and Refine-Net idea to fuse the feature maps of different levels.Finally,we constructed multi-type loss functions by the multilevel feature fusion map.It achieves the purpose of fast detection of text and precise location of text.In the research process,it is found that the CNN converges slowly,and in order to pave the way for the classification and recognition of the characters extracted by the detection model,an image classification optimization algorithm combining joint structural similarity and class information is proposed.The main research work in this thesis includes:(1)We use the deep residual network with strong multi-domain learning ability as the basic network,and remove the full connection layer to build a fully convolutional network for reducing training parameters and computing resources.In addition,the entire network was an end-to-end learning model and it strengthened the supervision of network learning.(2)This thesis adopts multi-feature map fusion learning strategy.Each deeper neuron of feature maps has a larger receptive field,and it is more appropriate to predict the large target,while the front feature map is more helpful to predict the small target.So the multi-layer feature graph fusion analysis is used to complete the detection task.(3)The focal loss was introduced to efficiently solved the imbalance of positive and negative samples and difficult sample selection.There are a lot of negative samples in the sample set,and the negative samples are likely to cover up discriminability of other samples.In this thesis,different sample weights are dynamically allocated by focal loss,so that the training process can fully learn those samples which have class information and are difficult to distinguish.(4)A kind of weighted joint structure similarity and class information method is proposed for the problem of slow convergence of CNN training.Firstly,we construct a CNN which can effectively extract high-level information for small images.Secondly,we construct weighted joint structure similarity and class information loss function to train convolutional neural networks.Finally,the effectiveness of the designed network structure is verified by mnist handwritten numeral and cifar10 image classification experiments.Experiments on standard datasets including ICDAR2013,ICDAR2015 and ICDAR2017 demonstrate that the proposed algorithm significantly outperforms state-ofthe-art methods in terms of recall,accuracy and F-measure.The F-measure obtained high scores of 0.8981,0.8369 and 0.640 respectively.For the image classification network,the experimental results show that the error rate of image classification on mnist handwritten and cifar10 are 0.33% and 11% by using the designed network,respectively.In the case of no data argumentation on the mnist dataset,the performance of the designed network much better than all the single networks.On the cifar10 dataset,the designed network can achieve higher image classification accuracy with less computation.At the same time,joint structure similarity and class information loss can speed up the network training process.
Keywords/Search Tags:scene text detection, CNN, feature fusion, focal loss, structure similarity, class information
PDF Full Text Request
Related items