Font Size: a A A

Research On Historical Character Recognition Based On Data Enhancement Technology

Posted on:2022-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q HanFull Text:PDF
GTID:2518306485481224Subject:Electrical engineering
Abstract/Summary:PDF Full Text Request
Ancient literature is an important bridge for the spread of ancient traditional culture and people's wisdom,and an important tool for learning ancient history.The ancient books have a huge variety of characters,and there are a large number of documents and materials.In order to better preserve the precious products of spiritual civilization,the digitization of ancient books and documents has attracted increasing attentions and become an important research topic.In this thesis,the published CASIA-AHCDB dataset that is collected from the Complete Library in Four Sections and ancient Buddhist sutras is used for evaluation.It's found that there are some variant characters with very few samples,or for some rarely used characters,the character samples is much less than the other character classes.For these kinds of characters,both training and accurately recognizing them are quite challenging.In this thesis,it's proposed to adopt data enhancement methods to overcome the problem of imbalance character samples and lacking of training data,and hence improve the performance of historical character recognition.The contents are as follows:(1)A text amplification method is proposed for characters with few samples.To improve the performance of character recognition of few samples,a data enhancement method based on GAN can be used to generate character image with label information.A character-based image controller and a two-scale structure are introduced to extend the diversity of sample features and improve the quality of text-based image.(2)Single-word data enhancement method is proposed.In view of the fact that character amplification based on deep learning relies heavily on the number of samples,this paper uses a single-character data enhancement method based on GAN to reconstruct the region feature information of characters and expand the diversity of sample features.The network model is designed based on the U-Net structure,and the dense connection structure and Spectrum standardization are introduced to improve the stability of the model and enhance the ability of reconstructing characteristic information.(3)The network model of character recognition is proposed for historical character recognition.In face of the small difference between characters and the challenging of feature extraction of Chinese historical characters,the Inception structure and the Drop Block module is introduced to improve the power of feature extraction.Experimental results show that the recognition model of ancient Chinese characters has good recognition performance.Finally,aiming at the problem of low recognition accuracy of network model for few samples class,data enhancement model is used to amplify data,and the recognition performance of network model is improved by increasing the number of few samples.
Keywords/Search Tags:Deep learning, Data augmentation, Ancient dataset, Handwritten Chinese characters recognition, Few-shot learning
PDF Full Text Request
Related items