Font Size: a A A

Scene Image Text Detection Research Based On Deep-Learning And Its Application

Posted on:2021-03-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:1368330611967145Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Text detection in natural scene image refers to the intelligent location and recognition of text semantics in natural scene environment from the perspective of machine vision,and the technology of obtaining the related attributes of scene text.It is widely applied in real-time translation of scene language,machine semi-automation,dyslexia assist,image content understanding,visual information retrieval,medical assisted diagnosis,automatic or assisted driving and other related fields,which has a huge potential economic value.At the same time,it has greatly promoted the development of artificial intelligence,pattern recognition and other trend research fields.Text detection in natural scene image is a very challenging research topic,its main difficulties lie in the complex background of natural scene image and noisy,low resolution,various text shape and changeable appearance,rich font color and random sample distribution,etc.With the problem of text detection in natural scene image,this thesis abandons the disadvantages of complex hand-crafted text feature design and low accuracy of traditional methods,mainly explores the application of deep learning technology in the field of scene text processing.Besides,as for the application problem of the complex natural scene texts of license plates,we put forward a real-time and robust license plate recognition model,which bases on dual attention transformation and shared adversarial training network(SATN)with the prior knowledge of standard stencil-rendered license plates.Specifically speaking,the research objectives and innovation of this thesis are as follows:1)In order to solve the problem of complex hand-crafted text feature design and low accuracy of traditional methods,this thesis proposes a text feature enhancement strategy based on full convolution neural network in deep learning,adaptive position-sensitive Ro I pooling,positive mining,etc.Among them,the text feature enhancement strategy changes the single branch feature extraction of text proposal in the existing full convolution neural network into multi branch feature extraction with up and down sampling based on bilinear interpolation and customized convolution kernels related to the text aspect ratios,replaces serial residual learning with parallel residual learning;And for the texts with large scales and aspect-ratio variations,the text specific convolution kernels are used to generate different “position sensitive feature maps” and pool them,besides,the adaptive weight of each pooling result is learned to make the detection results more accurate;Finally,in the positive mining strategy,we employ different scales,aspect-ratios and center random offset sampling around the positive samples for several times to adjust the ratio of positive and negative samples and improve the text detection accuracy;2)Aiming at the problem of arbitrary shape text detection,such as horizontal,oblique,curvilinear and wavy scenes texts,this thesis proposes to generate an appropriate amount of effective text proposals based on omnidirectional pyramid masks firstly,which does not need NMS algorithm to suppress redundant proposals and solves the dilemma of “stack omnidirectional text suppression”;Besides,through combining the modeling of pyramid lengthwise and sidewise residual sequence,which expands the receptive field of each feature layer of the full convolution neural network and integrates more context information,text features are extracted better;Secondly,multiple improved deformation convolution is used to fit arbitrary scene text shapes;And finally,based on the multi granularity arbitrary shape text classification module,the accurate detection results of arbitrary shape scene texts are output;3)Due to suffering from the appearance of wild license plates,blurring,noise,perspective distortion,uneven illumination,etc.,accurate recognition of license plates in natural environment is still a challenging task.The thesis proposes a novel dual attention transformation(DAT)module to correct the features of perspectively distorted wild license plates,which is beneficial for the following recognition.More importantly,to make the model learn environment-independent and perspective-free semantic features effectively and efficiently,we put forward a shared adversarial training network(SATN)with the prior knowledge of standard stencil-rendered license plates.The proposed method outperforms previous stateof-the-art methods by a large margin on the AOLP-RP and CCPD benchmarks.
Keywords/Search Tags:Scene text detection, deep learning, vision computation, adaptive position-sensitive RoI pooling, license plate recognition, dual attention transformation, positive mining, generative adversarial network
PDF Full Text Request
Related items