Font Size: a A A

Research On End-to-end Scene Text Recognition Method Based On Deep Learning

Posted on:2019-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y P LiuFull Text:PDF
GTID:2428330566486087Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Pictures can bring rich information to people,and words as the crystallization of human wisdom,the amount of information they contain is often much larger than the amount of information carried by the color texture,so the recognition and understanding of the text in the scene image is very necessary and important.However,because the complexity of text images in natural scenes is larger than that of traditional document scan images,traditional OCR text recognition no longer adapts to this new challenge.The new breakthroughs of artificial intelligence and computer science technology have made the scene text recognition method based on deep learning algorithm and theory more advanced than the traditional OCR technology,but there is still a big gap from the actual application.Therefore,the study of scene text recognition based on deep learning in this paper has important theoretical research significance and wide application prospects.This paper aims to study the text recognition method of Chinese text images in natural scene images,and proposes a end-to-end scene text recognition model and method based on convolutional neural network and recurrent neural network.Compared with traditional text recognition methods,this model and method have better feature learning and feature classification capabilities.The main work accomplished by this article includes:1.A scene text image feature extraction model based on deformable convolutional network is proposed.This model uses the deformable convolutional neural network to achieve the automatic extraction of text image features.Compared with other models,it has better feature learning ability and has better robustness to the recognition of complex scene text images,especially the robustness is better when there is a geometric transformation of the font in the text images.Using the feature extraction model proposed in this paper,the features in scene text images can be well extracted,and the performance of text recognition can be effectively improved.2.Improve the attention mechanism in the Encoder-Decoder framework.Ordinary attention mechanism usually uses global attention for decoding,and the input at the current moment is the weighted sum of all the original input.The improved attention mechanism adopts the local attention method and the input at the current moment is the weightedconvolution results on original input.After the local input is weighted,a convolution operation is performed to generate a new input.Through this mechanism,the network model can be more adapted to the complex structure of Chinese characters.The improvement of the attention mechanism has improved the accuracy of 0.5% text recognition.3.An improved post-processing operation for decoding output is proposed.Existing post-processing operations usually use pure search algorithms or search algorithms that incorporate a simple language model.Some of these search algorithms suffer from poor performance due to simple search or a time-consuming and long-time phenomenon due to complex search.The improved post-processing operation reduces search space and time without reducing the decoding performance,and incorporates an effective statistical language model.Experimental results show that the improved post-coding output can improve the decoding efficiency and decoding accuracy.4.A data augmentation method for complex text images in natural scenes is proposed.This method models a small amount of real-world scene text and synthesies the scene text images,which are closer to real text images in terms of font,color,noise,affine distortion.Through the data augmentation method given in this paper,you can quickly synthesize data sets that meet your own needs,and reduce the manpower and material resources for data collection5.An encoding and decoding model based on two-dimensional recurrent network is proposed.The model achieves end-to-end text recognition,avoids the reduction of dimension of text image features and makes use of the character structure information.In the traditional Encoder-Decoder framework,one-dimensional recurrent neural network is usually used as the core structure of its encoding and decoding.one-dimensional recurrent neural network is only suitable for sequence recognition.Therefore,in order to use the Encoder-Decoder framework for text recognition,it is usually necessary to reduce the dimension of the feature map of a 2D text image into a one-dimensional sequences and input they into the Encoder-Decoder framework.This operation severely damaged the spatial structure of Chinese characters and lost a large part of the spatial structure features.In this paper,the two-dimensional recurrent network is used as the core of the Encoder-Decoder framework,which can be directly connected with the feature map extracted from the deep convolutional network.TheEncoder-Decoder framework takes advantage of the spatial structure of Chinese characters,and has better robustness against deformations on the ordinate in text images at the same time.Experimental results show that compared with the one-dimensional recurrent network,using two-dimensional recurrent network for encoding and decoding can improve the text recognition accuracy by 2.6%,and achieve 78.6% recognition rate.Compared with the standard two-dimensional recursive network,the two-dimensional recursive network in this paper is close to it in performance and has the characteristics of fast calculation speed and simple network model design...
Keywords/Search Tags:text recognition, deep learning, convolutional neural network, Encoder-Decoder, attention mechanism
PDF Full Text Request
Related items