Font Size: a A A

Research On Handwritten Text Recognition And Translation Based On Deep Attention Mechanism

Posted on:2020-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:S J ZhouFull Text:PDF
GTID:2428330590973246Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the field of text recognition,convolution neural network and recurrent neural network are usually combined to recognize text lines,which are called convolution recurrent neural network CRNN.However,the serial calculation method of recurrent neural network will lead to a lot of time spent in the training stage.In this paper,a handwritten text recognition model based on deep attention mechanism is studied,aiming at reducing the time cost of training model while maintaining the recognition accuracy.On the basis of the above research,two-stage method and end-to-end method of image text recognition are further studied.In this paper,two handwritten recognition models,CANN and CNNTransformer,are proposed,which use attention mechanism instead of recurrent neural network structure.The CANN model uses the inner product-based self-attention mechanism to extract sequence features instead of the recurrent neural network,and use the Connectionist Temporal Classification algorithm in training.Based on the encoder-decoder framework,CNN-Transformer model converts text recognition problems into sequence-to-sequence problems.Both encoders and decoders use an internal product attention mechanism that can be computed in parallel.Experiments on the open handwritten data sets IAM and SCUT-EPT verify the validity of the two models.In this paper,the working principle of neural network Softmax classifier is studied in depth,and the working mechanism of the classifier is summed up as an application of attention mechanism.Several methods are explored to improve the generalization ability of the model by constraining the weights of the classification layer.These methods include: minimizing the inner product of different kinds of central variables,fixing central variables,adding L2 constraints to hidden layer vectors,and initializing central variables with orthogonal vector groups.This paper further studies the combination of text recognition model and machine translation model for image text recognition and translation.A two-stage method and an end-toend method are proposed.The end-to-end model is trained using the method of transfer learning.In this paper,we do experiments on synthetic data sets,and compare the advantages and disadvantages of two-stage method and end-to-end method.The experimental results show that the training speed of CANN model proposed in this paper is obviously faster than that of CRNN model.Four methods to improve generalization ability are effective,among which the method of adding constraints to hidden layer vectors achieves the best results.The recognition accuracy of the proposed CNN-Transformer model on Chinese handwritten data sets is higher than that of the classical CRNN model.In the case of fewer end-to-end training data,the effect of two-stage method is better than that of end-to-end method.The end-to-end model is more advantageous if the training data of the two-stage identification model and translation model are the same as that of the end-to-end model.
Keywords/Search Tags:handwritten recogniton, deep learning, attention machanism, machine translation, end-to-end model
PDF Full Text Request
Related items