Font Size: a A A

Research On Words Recognition Of Historical Mongolian Documents Based On Sequence To Sequence Model

Posted on:2021-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y K KangFull Text:PDF
GTID:2415330620476441Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Historical Mongolian documents provide rich and reliable information for studying Mongolian culture.In order to better rescue,preserve,mine and use the historical Mongolian documents,the library of Inner Mongolia University carried out the work of digitizing the historical documents,so that the collected historical Mongolian documents can be scanned into image format for storage.However,it is difficult to retrieve,analyze and mine the word images of historical Mongolian documents.The most effective way to solve these problems is to use optical character recognition technology to convert the images into text.The Mongolian optical recognition technology can be roughly divided into two types: one is to segment the word into a series of glyph along the writing direction,and then recognize the glyphs separately;the other is segmentation-free recognition method.The segmentation step has high requirements on image quality.However,most of the historical Mongolian documents are very old.The scan quality is poor due to the existence of a large number of stains,broken pens,and fading.In addition,the writing distortion in historical Mongolian documents is also serious,so that most words cannot be segmented into glyphs accurately.Therefore,segmentation-free approaches should be taken into account.At present,the existing segmentation-free recognition method for historical Mongolian documents is based on Convolutional Neural Network(CNN).This method regards the holistic recognition as an image classification task.It can realize segmentation-free recognition,but cannot overcome the problem of Out-of-Vocabulary.In response to this situation,the main research contents of this paper are as follows:(1)To solve the problem of segmentation-free recognition in Mongolian documents,a Sequence to Sequence(Seq2Seq)recognition model with attention mechanism is proposed.The word image to be recognized is regarded as a frame sequence composed of a series of image frames,and the textual annotation of the word is regarded as a character sequence.The sequence of image frames is mapped to the character sequence through a Sequence to Sequence model.The model consists of an encoder,a decoder and an attention network.The encoder consists of a Deep Neural Network(DNN)and a Bi-directional Long Short-Term Memory(Bi-LSTM).The DNN extracts the features of the input image frame sequence.The Bi-LSTM obtains the context relationship between the frame sequences.And the encoder generates the feature vector sequence corresponding to the frame sequence.The decoder consists of a Long Short-term Memory Networks(LSTM)and Softmax classification layer,which decodes the feature vector sequence generated by the encoder to generate the corresponding character sequence(recognition result).The attention network is used to connect the encoder and the decoder,so that the decoder pays attention to one or more image frames most relevant to the target character at different times.Therefore,the attention mechanism can improve the accuracy of the decoder.The Sequence to Sequence segmentation-free recognition model with attention mechanism proposed in this paper can not only solve the problem of unequal length of input sequence and output sequence,but also overcome the problem of Out-of-Vocabulary.The experimental results show that the proposed method is superior to the segmentation-based method and the existing segmentation-free method in recognition accuracy.(2)To solve the problem of insufficient training data of the historical Mongolian document recognition task,this paper proposes a data augmentation method based on Cycle-Consistent Generation Adversarial Networks(CycleGAN).CycleGAN is composed of two symmetrically generated adversarial networks,which can convert between two sample spaces without training data being paired.By constructing consistency loss,the source image can also be converted back to the original space after being converted to the target space.In order to obtain a new sample,the historical Mongolian document word image can be sent to the trained CycleGAN,which is first converted into a sample in the target space,and then converted into a sample in the original space.In this way,new samples(images)of the same word can be obtained,thereby realizing data augmentation.The experimental results show that the enhanced data can further improve the recognition accuracy of the proposed sequence-to-sequence recognition model with attention mechanism.
Keywords/Search Tags:historical Mongolian documents, segmentation-free recognition, Sequence to Sequence model, attention mechanism, data augmentation
PDF Full Text Request
Related items