Font Size: a A A

Mongolian Offline Handwriting Recognition

Posted on:2021-05-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:D E J FanFull Text:PDF
GTID:1368330620976624Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Handwriting recognition has always been an important research field of pattern recognition,and has received extensive research and attention from academia.Handwriting recognition research in popular languages(such as Chinese,English,and Japanese)has evolved from simple isolated word recognition to text line recognition,unconstrained handwriting recognition,document recognition,and scene text recognition.Nowadays,the researches on Mongolian offline handwritten recognition are at their initial stage and the related work are very limited.Moreover,the huge vocabulary,free writing,severely deformed characters on writing brings great challenges to Mongolian offline handwriting recognition.Therefore,in this dissertation we takes traditional Mongolian as the research object,and conducts offline handwriting recognition research.Since there is no handwritten data sets for Mongolia,we built the first manually annotated Mongolian word level offline handwriting data set.Mongolian dictionary words were collected and organized and the handwritten words were extracted and the samples were handwritten by many hands.After that,the Mongolian word level offline handwriting data set is formed by manual checking,modification and pre-processing.The main contributions of this work are as follows:(1)For features of huge vocabulary in Mongolian,we proposed a segmentation based Mongolian large vocabulary handwriting recognition method.In this method,the entire word is not modeled,but smaller units are modeled to deal with large vocabulary problems.According to Mongolian word formation,coding and knowledge of grammar,three kinds of segmentation units"Twelve Prefix Character","Presentation Forms"and"Grapheme Code"were selected,and the effects were verified by experiments.Finally,the"Grapheme Code"is determined to be a Mongolian word segmentation unit.(2)For Mongolian handwritten words with sequence data characteristics and serious distortion,this dissertation uses a hybrid model of HMM and DNN to achieve the Mongolian handwriting recognition system.The Mongolian handwritten images are regarded as one-dimensional random sequences generated along the writing direction.The HMM is used to describe the sequence generation process,and the DNN is used to describe the probability distribution of the sequence data.The Mongolian handwriting recognition problem is equivalent to the speech recognition process,so the successful speech recognition method is transplanted into the Mongolian handwriting recognition system and achieved good results.(3)For the serious correlation between characters in handwritten Mongolian,in this dissertation the association between before and after characters or long dependence are modeled by recurrent neural network(RNN),and a sequence-to-sequence Mongolian handwriting recognition method is proposed.Since the image is essentially a two-dimensional sequence,handwritten images are scanned by two-dimensional LSTM with CTC output in four direction to direct mapping of two-dimensional sequence to the Mongolian grapheme code.(4)For the high out-of-vocabulary(OOV)problem in Mongolian,a decoding scheme on CTC output combination with sub-word language model is proposed.Although the two-dimensional LSTM has better modeling capabilities,but the network itself does not have an effective decoding scheme.As the Mongolian is formed by stems and suffixes,the words of dictionary are organized by weighted finite-state transducers which can reduces the decoding time complexity.Finally,through a variety of experiments,it is proved that the sub-word based decoding scheme not only has a higher recognition rate for the words in vocabulary,but also has a certain recognition ability for OOV.To conclude,through the research in the above four aspects,the performance of Mongolian offline handwriting recognition has reached a high level,which can provide technical support for other Mongolian information processing tasks.At the same time,the research results of this dissertation can also provide technical support for Mongolian handwritten document recognition.It has an important meaning for mining and using Mongolian handwriting document resources,inheriting and developing minority culture.
Keywords/Search Tags:Mongolian, Offline Handwriting Recognition, Hidden Markov Model, Deep Neural Network, Recurrent Neural Network
PDF Full Text Request
Related items