Font Size: a A A

Deep Network Model Based On Feature Layer Fusion Of Visual Information And Linguistic Information For Handwritten Chinese Text Recognition

Posted on:2020-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y H XiuFull Text:PDF
GTID:2428330596468144Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Handwritten Chinese text recognition(HCTR)is one of the research hotspots and difficulties in the field of computer vision and pattern recognition.The rise of deep learning provides new research methods for HCTR.Most existing deep learning based methods first use visual information to train the recognition model,then fuse the results of the recognition model with the language model to improve the recognition performance.This is the decision layer fusion of the visual information and the linguistic information.This thesis starts from the perspective of the feature layer fusion of the visual information and the linguistic information.We construct deep network models based on deep learning method and reinforcement learning method to solve the HCTR problem.The aim is to improve the recognition performance through multimodal joint representation with rich semantic information.The main work of this thesis includes:(1)This thesis studies the application of the attention-based encoder-decoder model in HCTR.We construct a character-level feature layer fusion based HCTR model by combining the character-level visual information and linguistic information based feature layer fusion module with the long short-term memory(LSTM)based decoder module.Specifically,since there are expression and semantic level differences between the visual information and the linguistic information,we explore three character-level feature layer fusion methods,namely vector addition,vector concatenation and gated mechanism,which can effectively learn the multimodal joint representation for each Chinese character.The experimental results show the effectiveness of the feature layer fusion of the visual information and the linguistic information,and verify that the gated mechanism method achieves better performance than the other two methods.(2)Based on the character-level feature layer fusion methods,we establish a multilevel feature layer fusion based HCTR model by using a convolutional neural network to fuse the visual information and the linguistic information contained in the historical text fragment predicted by the LSTM based decoder.By effectively modeling the historical text fragments,the problems of the vanishing gradient,exposure bias and inability to capture hierarchical structure information in text sequences faced by the decoder are solved.The experimental results show that our proposed method achieves a comparable result with the state-of-the-art approaches.Compared with the characterlevel feature layer fusion based HCTR model,our proposed method yields a relative character level accuracy rate increment of 3.34%.(3)This thesis explores the methods of utilizing reinforcement learning to optimize HCTR model.We construct a deep reinforcement learning based HCTR model which is guided to learn a better sequence decision process by the evaluation metric used in the test phase.Specifically,we adopt the policy gradient method in reinforcement learning and take the evaluation metric used in the test phase as a reward to fine tune the parameters of the multi-level feature layer fusion based HCTR model.The problems of exposure bias and loss-evaluation mismatch faced by the encoderdecoder model are solved by using reinforcement learning method.The experimental results show that the reinforcement learning can effectively improve the recognition performance.
Keywords/Search Tags:handwritten Chinese text recognition, feature layer fusion, deep learning, reinforcement learning, multimodal joint representation
PDF Full Text Request
Related items