Discriminatively Train Classifiers Embedding On Synthetic String Samples For Chinese Handwritten String Recognition

Posted on:2011-12-16

Degree:Master

Type:Thesis

Country:China

Candidate:X Chen

Full Text:PDF

GTID:2178330338479954

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Off-line Chinese handwritten text recognition is one of the most challenging problems in pattern recognition field. By now, printed Chinese character recognition and on-line Chinese handwritten recognition has been gradually practical, while off-line Chinese handwritten character recognition is still considered as "The hardest problem to conquer" in this field due to its own complexity. Recently, it becomes a hot topic with the release of HIT-MW database, which is the first text-level database and is concerned about the area of realistic Chinese handwritten chatacter recognition. We aim at the realistic Chinese handwritten text recognition and explore three aspects of the problem. Firstly, a system based on segmentation-recognition integrated framework was developed for Chinese handwriting recognition. Secondly, the parameters of embedded classifier initialed at character-level training were discriminatively re-trained at string level. Thirdly, two perturbation models were adopted to synthesize Chinese text line samples.The segmentation-recognition integrated framework runs as follows: the input image is first over-segmented into primitive segments, and then the consecutive segments are combined into candidate patterns. The embedded classifier is used to classify all the candidate patterns in segmentation lattice. According to path evaluation function, the system outputs the optimal path in segmentation-recognition lattice, which is the final recognition result. The embedded classifier is first trained at character level on isolated character samples and then the parameters are updated at string level on string samples. The learning process is accomplished by stochastic gradient descent aiming at optimizing the MCE criteria. Experimental results show that the string-level training improve the performance of Chinese handwritten text recognition by reducing the insertion error.In addition, the statistics of the sample number of each character class in HIT-MW database shows that there exits a serious shortage of string samples in string-level training. Therefore, two perturbation models are adopted to synthesize Chinese string samples for expanding training set. One applies some geometrical transformations on exiting natural string samples. The distortion strength is controlled by a nonlinear, continues function whose parameters are selected randomly before each geometrical transformation. The other perturbation model firstly distorts each underlying characters of the natural string samples by a transformation function, and then connect the distorted characters into synthetic samples according to the original gaps between adjacent characters in natural sample. Experiments show that the synthetic sample can improve the performance of Chinese handwriting recognition not only in our segmentation-recognition integrated system but also in HMM based segmentation-free strategy.

Keywords/Search Tags:

Chinese handwritten text recognition, synthetic samples, discriminative learning, segmentation-recognition integrated framework, string-level training

PDF Full Text Request

Related items

1	Bank Cheque In Handwritten Application Domain String Recognition
2	Research On Fast Recognition Method Of Handwritten Form Digital String Based On Self Learning
3	Research On Connected Characters Recognition For Handwritten Checks And Its Application
4	Writer-independent Unconstrained Handwritten Offline Chinese Text Line Recognition
5	Handwritten Chinese Text Recognition Based On Deep Convolution Model
6	Off-line Recognition Of Chinese Handwriting: From Isolated Character To Realistic Text
7	Offline-online Skeleton Samples Co-training Based DBLSTM Handwritten English Recognition
8	Chinese Handwritten String Recognition Methods Fusion Based On Attention Mechanism
9	Research On Algorithm For Unconstrained Handwritten Numeral String Segmentation And Recognition
10	Research On Offline Handwritten Chinese Character Cognitive Model And Similar Samples Cognition