Font Size: a A A

Research On Text Detection In Natural Scenes

Posted on:2021-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:J J SunFull Text:PDF
GTID:2428330605950130Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years,people's living standards have been greatly improved,which has brought about a tremendous change in people's lifestyles.More and more people choose to cross-border online browsing information.Every day,they will contact or produce a large number of pictures of natural scenes containing important information Therefore,the demand for text understanding of pictures in natural scenes is increasing.How to locate and recognize pictures in natural scenes has become a hot topic of research.This article focuses on text detection in natural scenes.The purpose of text-based detection is to more Good text recognition,this article focuses on improving the accuracy of text detection and the fit of the regression box to the text area.The network uses a codec four-level network to extract convolutional features.The network coding structure is mainly divided into four Blocks,and the network's decoding structure performs feature fusion on the output characteristics of the four Blocks.The network is based on the idea of image segmentation,uses a fully convolutional network structure for the fusion of different levels of features,and uses a deconvolution network that is a decoding structure to restore the picture to a quarter of the original image,and returns it quickly and accurately Text area,output prediction box in one step.It avoids the problems of many text detection steps,low efficiency and high complexity of deep learning.In terms of improving the accuracy of the text and the fit of the regression frame,according to the visual characteristics of the human eye,the local structure of the network is improved,and a network structure that is more in line with the text characteristics of natural scenes is designed.Each block of the feature extraction network is responsible for The text area comprehensively uses inception,feature fusion,and ASPP ideas to simulate the visual characteristics of the human eye,and designs a local network that matches the text characteristics.At the same time,this paper also differentiates the text area during label generation,and cooperates with the design structure of the network to make the returned text box more fit to the text area,improve the network's robustness to complex backgrounds,and improve the detection of text areas The dynamic range improves the accuracy of text detection,effectively improves the phenomenon of multi-frames and fewer frames in the text area,and reduces the noise sent to the recognition network.In terms of model compression,the network simulates the human eye,uses a hollow convolution structure,reduces redundancy,reduces the network layer,and compresses and accelerates the network,compressing the network without reducing the accuracy of text detection,combining existing The network compression method compresses and quantizes the network,compressing the network model from more than 600 megabytes to about 100 megabytes.The network in this paper uses a regression method to output a text box with a rotation angle in one step.Use cross-entropy Lovasz loss with convex optimization of cross-entropy loss to improve positioning accuracy.Experiments show that the text detection proposed in this paper can locate text well,and has achieved good results in text positioning,effectively solving text sticking and errors.The problem of partitioning,the returned text box fits the text area better,which is helpful for the next text recognition.
Keywords/Search Tags:Text detection, Nature scene, Codec structure, Feature fusion
PDF Full Text Request
Related items