Font Size: a A A

Research On End--to-End Text Recognition In Natural Scene

Posted on:2021-04-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:L J DengFull Text:PDF
GTID:1368330647960707Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Text recognition and related issues are always hot and difficult in the field of com-puter vision,the technology has been widely used in language translation,driver as-sistance,geographic location,image retrieval and many other aspects,researchers have started relevant research decades ago.Traditional recognition techniques for document images have matured,but still face huge challenges in scene images.The variable expres-sion forms make scene texts have various combinations in fonts,scales,shapes,colors typography,etc.,and the complex background and unrestricted imaging conditions make the text recognition more difficult.This dissertation conducts a comprehensive and in-depth study on text detection,recognition and related issues in scene images,focusing on concise and efficient ideas or methods.The aim is to put forward new solutions to the deficiencies in the current research situation through these works,and verify and deploy them in related scenarios to show their good versatility and practicality.The research work of this dissertation is mainly divided into the following points:Firstly,in order to reduce the dependence on anchor design,this dissertation proposes a simple and efficient real-time text detection networks,which only needs to set a basic reference box at each detection position.The characteristic of the networks is that the learning mechanism is introduced into the one-stage detection framework,and the learning anchors after regression optimization are substituted for the initial anchors into the final prediction.This networks model achieved excellent detection accuracy in multiple public benchmarks,and also surpassed all anchor-based detection methods in the same period.Secondly,this dissertation proposes a two-stage multi-oriented text detection net-works that does not rely on any prior knowledge.It innovatively generates candidates by locating and linking the four corners of text boundingbox,instead of sliding the an-chors to scan the entire image to estimate the possible location and shape of the text.The quadrilateral candidate generated by the corners is geometrically adaptive,which makes the detection model relatively insensitive to the scale and shape of the text.In addition,we propose a pooling layer called Dual-Ro I Pooling,which is a data augmentation module built into the networks,and can make more effective use of training data to stably improve detection robustness.The detection results on multiple public datasets prove the effective-ness of our method,and it is also very competitive in terms of detection efficiency.Thirdly,we argue that the main reason that affects the recognition accuracy of ir-regular text images is that the background occupies a relatively high proportion of the image,and the fixed receptive fields and sampling points of the standard convolution will introduce more redundant irrelevant information.This dissertation takes advantage of the adjustable geometric structure of the deformable convolutional layer,and proposes a focus-enhanced recognition networks without additional operations.Through end-to-end training,the convolution kernel can learn to adjust its sampling position,so as to extract more representative relevant convolutional features.The test results on multiple public datasets prove that the networks has improved relative to the baseline and obtained excel-lent recognition accuracy at the time.Fourthly,existing deep learning based text recognition networks require a large amount of labeled data for model training,but the existing text sequence image synthesis method usually requires a series of relatively complicated processing procedures.This dissertation proposes to regard the generation of sequence images as a kind of image-to-image conver-sion,and use the generative adversarial networks to convert a simple character sequence semantic image into a realistic scene text image.The entire process can be completed in only two steps.Multiple evaluation criteria for generated images and actual text recogni-tion accuracy prove the effectiveness of this method.Fifthly,combined with our previous work,this dissertation proposes a complete and universal end-to-end text recognition networks.It integrates multiple related tasks,and only needs one forward propagation to complete text detection and recognition simul-taneously.Multiple branch networks share convolutional features,and the networks can extract more targeted features through multi-task training.Based on the lightweight skele-ton networks and concise branch architecture,the networks maintains a completely real-time process speed while accurately identifying.In addition,we applied it to license plate recognition,and demonstrated excellent recognition accuracy without changing most of the networks parameters,which proved its good versatility.At last,all the published work in this dissertation has open sourced the relevant code and data,please refer to each chapter for details.
Keywords/Search Tags:Scene Text, Text Detection, Text Recognition, End-to-End Recognition, Deep Learning
PDF Full Text Request
Related items