Font Size: a A A

Study On Writer Adaptive Handwriting Chinese Character Recognition System Based On Comprehensive Online Unconstrained Dataset

Posted on:2014-01-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y GaoFull Text:PDF
GTID:1228330401960210Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Handwritten character recognition (HCR) is a popular research direction in the field ofpattern recognition for several decades. However, due to the diversity of user writing styleand the variability of the handwritten characters, the unconstrained handwritten characterrecognition is still a challenge in HCR field. Although great progress has been achieved forthe regular handwritten Chinese character recognition whose accuracy rate reaches98%, theunconstrained handwritten recognition is far lower with the accuracy rate of only about93%.Therefore, it is very necessary to improve the recognition performance of the unconstrainedChinese handwriting recognition.Nowadays, personal hand-held devices, such as smart mobile phones, personal digitalassistants (PDAs), e-book readers and Tablets (such as iPad), have been playing importantroles in human’s life. With the widespread use of touch screen and the rapid development ofhandwriting recognition technology, the handwriting input method is becoming more andmore popular on mobile terminals. However, the handwritten samples collected fromlaboratory is very limited and it is impossible to cover all writing style, as a result, thehandwritten character recognition system can’t achieve satisfactory performance in the actualapplications, especially for particular users with particular handwriting styles.In view of the above problems, this dissertation dedicates to study the technology ofwriter adaptive handwriting Chinese character recognition system based on comprehensiveonline unconstrained dataset. The proposed classifier is trained by large-scale unconstrainedhandwritten data, in order to cover as more handwriting styles as possible. Therefore, theperformance for most users is satisfied. Meanwhile, the writer adaptation algorithm proposedin this dissertation can adaptively learn a particular user’s writing style, so as to improve therecognition rate of that user, and make the user feel “more write to more accurate”. However,there are still a lot of difficulties to be solved in this work, including the problem of thevariable style of handwritten Chinese characters; the problem of complex structures ofChinese characters and large amount of the similar characters which are difficult todistinguish; the problem of the handwritten dataset collection and pretreatment; the problemof generic classifier dictionary compression in incremental learning; the problem of thatincremental learning lowers the recognition rate of generic users; and the problem ofincremental learning the handwriting classifer in the discriminative feature space. In order tosolve the above problems, the work has been done in this dissertation is listed as follows:1. This dissertation focuses on the technology of isolated handwritten Chinese character recognition, including the pre-processing, feature exaction, classification and theclassifier combination. On that basis, this dissertation proposes a rapid handwrittencharacter recognition system and high-performance handwritten characterrecognition system. The experiments show that the recognition speed of the rapidsystem can achieve1.7ms/char while the size of its classifier dictionary is only2Mb.On the other hand, the high-performance system can improve the recognition rateobviously. The recognition rate on SCUT-COUCH2009can reach97.04%, and93.57%on CAISA-OLHWDB1.2. Due to handwritten sample number limitation, the handwriting styles are not enoughof the public handwritten Chinese character datasets, so a comprehensive onlineunconstrained handwritten character dataset has been collected and organized for thisdissertation. This dataset covers a wide range of categories, including isolatedsimplified Chinese characters, isolated traditional Chinese characters, Chinesephrases, Chinese pinyin, English alphabets, Arabic numerals, symbols, texts and soon. This dataset is collected with PDA (Personal Digit Assistant) and smart phoneswith touch screens, contributed by more than190different persons, resulting in morethan3.6million handwritten samples. This dataset is the first public onlinehandwritten Chinese characters dataset with large vocabulary, high frequencyChinese phrase and Chinese pinyin, which provided very valuable data for onlinehandwritten Chinese phrases recognition and pinyin recognition. Otherwise, wecollect a great number of handwritten samples from network. The total number ofwriters is more than200thousands, and the total sample number is more than150millions. It provides powerful help for the research of Chinese handwritingrecognition.3. Due to the huge parameter storage problem and lower recognition rate for genericusers in the original incremental modified quadratic discriminant function (IMQDF)algorithm, this dissertation proposes a smoothing compact IMQDF (SCIMQDF)algorithm. The proposed algorithm can significantly reduce the parameter storage to1/50, and imrove the recognition rate both of specific user and generic users.Meanwhile, this dissertation also proposes an IMQDF algorithm learned indiscriminant feature subspace, which can rapidly transform the original MQDFclassifier to the updated subspace, sequentially.4. Many advanced handwritten character recognition algorithms are not applicable forhand-held electronic devices because of the complex computation and huge storage required by the algorithms. To handle that problem, this dissertation proposed anddesigned a cloud computing platform based handwritten character recognition system.Since the powerful computation ability and mass storage of the cloud sever, moreadvanced handwritten character recognition algorithms, such as more advancedclassifiers and writer adaptation system, can be implemented based on cloud platform.Therefore, this system not only can improve the recognition rate of generic users, butalso can make the specific user feel “more write to more accurate”.In brief, the study on the technology of writer adaptive handwriting Chinese characterrecognition based on comprehensive online unconstrained dataset is a comprehensive researchproject involving in the fields such as handwritten character recognition, machine learning,image processing and incremental learning and so on. This system can greatly improve therecognition rate of the specific user through the incremental learning method, while therecognition rate of generic users is not decreased. Therefore, it can provide a more natural andhumanity user experience. These advantages show that the technology of writer adaptivehandwriting Chinese character recognition based on comprehensive online unconstraineddataset will be a important research direction in the HCR field.
Keywords/Search Tags:Handwriting recognition, Incremental Learning, online handwriting dataset, Writer Adaptation, IMQDF, cloud computing
PDF Full Text Request
Related items