Font Size: a A A

Document Image And Scene Text Recognition Based On Deep Learning

Posted on:2020-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LiFull Text:PDF
GTID:2428330590960953Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Texts is a basic tool of social communication,and plays an important role in people's lives.With the development of information technology,hundreds of millions of images are flooded in every corner of the Internet.People hope to be able to understand and process images efficiently through computers.Text information is very important for image understanding.Therefore,text image recognition has been a hot research direction for decades.This thesis mainly studied two important tasks of text image recognition: Chinese document image recognition and Chinese scene text recognition.These two problems have some similarities and can be dealt with by a sequence recognition model in general.The main work and contributions of this thesis include:1.For Chinese document image recognition,this thesis studied the text line recognition and word segmentation of Chinese ancient documents.Firstly,the line image is recognized endto-end through a sequence recognition model based on attention mechanism.Then,the weight probability distribution is obtained by the recognition model.Finally,the position of the words in the original image is found according to the coordinate mapping relationship and the weight probability distribution,which realize weakly supervised word segmentation.The proposed method can recognized the text lines of ancient documents and give the approximate position of words at the same time.It can be applied to the auxiliary tagging system to reduce the labor cost.2.For Chinese scene text recognition,a text line recognition model based on residual convolution network and residual recurrent neural network is proposed.Chinese text lines have a large number of categories and complex structures,which require deeper layers and larger parameters.This will lead to the problem of vanishing gradient and exploding gradient,which limits the performance of the recognition model.Residual connections can help gradient propagation,make network training easier and ultimately achieve better recognition results.Word recognition rate increased by 3.76% and line recognition rate increased by 5.43%.3.In order to improve the recognition effect of attention mechanism model,We proposes to use multi-head attention mechanism to replace the ordinary attention mechanism.By calculating the weight probability distribution in different channels of segmentation,multi-head attention mechanism can focus on different parts of Chinese characters,which has the effect of model integration.This method can improve the performance of Chinese scene text recognition.Word recognition rate increased by 4.87% and line recognition rate increased by 6.38%.4.Based on the studies of Chinese document image recognition and Chinese scene text recognition,this paper compares the performance of CTC model and attention mechanism model in Chinese text recognition task,and analyses the advantages and disadvantages of these two models.
Keywords/Search Tags:Document Image Recognition, Scene Text Recognition, Deep Learning, Deep Residual Network, Attention Mechanism
PDF Full Text Request
Related items