Font Size: a A A

Natural Scene Text Detection And Recognition System Based On YOLO-v3 Network

Posted on:2019-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:S H FanFull Text:PDF
GTID:2428330578980115Subject:Optical Engineering
Abstract/Summary:PDF Full Text Request
In this paper,a natural scene text detection and recognition system based on YOLO-v3 network is proposed.Three key problems of training samples,text detection and character recognition are explored,and solutions are provided.The main work is stated below.(1)Aiming at the problem of insufficient training samples in deep learning network,a text generator is designed to generate massive text images.Firstly,this paper summarizes the text characteristics of natural scenes,and designs a functional module which can simulate and generate text images of natural scenes,so as to automatically generate training samples of the system and solve the problem of lack of training samples.(2)In view of the changeable characters in natural text image detection,the YOLO-v3 neural network is redefined the number of anchors and the dimension of width and height in the detection process,and the character candidate box in natural scenes is clustered.In addition to the original feature map of YOLO-v3,the detection map is 52*52,which is removed from the network structure.It can effectively speed up the processing of image by YOLO-v3 model under the condition of guaranteeing accuracy.In the training process,the image size is adjusted adaptively,and the generalization ability of model detection is improved by training different size images.(3)In view of the difference between Chinese character recognition and handwritten numeral recognition,this paper improves the traditional Lenet-5.Based on Lenet-5,the image size was changed from 28*28 to 64*64,which accords with the general size of Chinese characters in natural scenes.Because the types of Chinese characters studied in this paper are 3755,in order to cope with this multi-classification task and let the network acquire enough personality characteristics,the 7-layer structure based on LeNet-5 is extended to the 11-layer.The difference between these convolution layers and Lenet-5 convolution layers isthat only the number of filters is changed without changing the image size.Through these convolution layers,the learning of features in the original model can be expanded to ensure that the model can learn more information and ensure the accuracy of the final classification of Chinese characters.Although the paper has done some work on sample generation,text detection and character recognition in natural scene character recognition,which improves the performance of text detection and recognition in natural scene and solves some problems,due to the diversity and complexity of natural scene,the system still has a lot of room for improvement.However,it needs a lot of research and exploration.Technology is developing and progressing.I believe that in the near future,the text detection and recognition system of natural scenes will get better practical application.
Keywords/Search Tags:deep learning, YOLO, text detection, character recognition
PDF Full Text Request
Related items