Font Size: a A A

Discriminatively Train Classifiers Embedding On Synthetic String Samples For Chinese Handwritten String Recognition

Posted on:2011-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2178330338479954Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Off-line Chinese handwritten text recognition is one of the most challenging problems in pattern recognition field. By now, printed Chinese character recognition and on-line Chinese handwritten recognition has been gradually practical, while off-line Chinese handwritten character recognition is still considered as "The hardest problem to conquer" in this field due to its own complexity. Recently, it becomes a hot topic with the release of HIT-MW database, which is the first text-level database and is concerned about the area of realistic Chinese handwritten chatacter recognition. We aim at the realistic Chinese handwritten text recognition and explore three aspects of the problem. Firstly, a system based on segmentation-recognition integrated framework was developed for Chinese handwriting recognition. Secondly, the parameters of embedded classifier initialed at character-level training were discriminatively re-trained at string level. Thirdly, two perturbation models were adopted to synthesize Chinese text line samples.The segmentation-recognition integrated framework runs as follows: the input image is first over-segmented into primitive segments, and then the consecutive segments are combined into candidate patterns. The embedded classifier is used to classify all the candidate patterns in segmentation lattice. According to path evaluation function, the system outputs the optimal path in segmentation-recognition lattice, which is the final recognition result. The embedded classifier is first trained at character level on isolated character samples and then the parameters are updated at string level on string samples. The learning process is accomplished by stochastic gradient descent aiming at optimizing the MCE criteria. Experimental results show that the string-level training improve the performance of Chinese handwritten text recognition by reducing the insertion error.In addition, the statistics of the sample number of each character class in HIT-MW database shows that there exits a serious shortage of string samples in string-level training. Therefore, two perturbation models are adopted to synthesize Chinese string samples for expanding training set. One applies some geometrical transformations on exiting natural string samples. The distortion strength is controlled by a nonlinear, continues function whose parameters are selected randomly before each geometrical transformation. The other perturbation model firstly distorts each underlying characters of the natural string samples by a transformation function, and then connect the distorted characters into synthetic samples according to the original gaps between adjacent characters in natural sample. Experiments show that the synthetic sample can improve the performance of Chinese handwriting recognition not only in our segmentation-recognition integrated system but also in HMM based segmentation-free strategy.
Keywords/Search Tags:Chinese handwritten text recognition, synthetic samples, discriminative learning, segmentation-recognition integrated framework, string-level training
PDF Full Text Request
Related items