Font Size: a A A

Multi-contexts Based Online Handwritten Chinese Text Recognition Methods And System Implementation

Posted on:2018-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:L Q QiuFull Text:PDF
GTID:2348330533466319Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the popularization of smart phones,handwriting input has gained more and more users' attention.At the same time,handwriting input methods and other related applications have also asked for better technologies of handwritten text recognition.Thus,we study the methods of handwritten Chinese text recognition,and try to make improvement in the drawbacks of the technologies.This paper mainly contains the following work:1)In this paper,we collect a handwritten Chinese character database named SCUT-onHCCTestDB.The database contains 450 thousand samples of 9798 classes(including 195 classes of symbol and 785 classes of rarely-used character),and the characters of it are written in a variety of styles.The data is divided into five subsets,respectively for the simplified Chinese set,traditional Chinese set,simplified and traditional Chinese mixed set,rarely-used Chinese character set and symbol set.This database can be used in many areas such as handwritten Chinese character recognition,handwritten text segmentation algorithm and so on.2)In overlaid and text line handwriting input modes,single character may be cut into multiple segments by over-segmentation algorithm,leading to a decline of recognition rate.To solve this problem,we propose a binary class-irrelevant geometric model.With this model,the error rates of segmentation on single character decrease from 11.51%,27.68% to 3.89% and 4.40% respectively.The character recognition rates increase from 90.07%,81.27% to 93.63% and 93.88%,and the corresponding relative error rate ratios(RERR)are 65.61% and 88.68%,respectively.We also find out that the binary class-irrelevant geometry model is better than the linear density model.3)Considering the limitations of the traditional file-based associative words and the N-gram language model,this paper presents a language model based on Long-Short Term Memory Recurrent Neural Network(LSTM).On SogouCA corpus,our best model gets 25.32 perplexity.Experiments show that the LSTM language model is superior to the traditional methods in terms of associative phrases.Furthermore,it also improves the overall performance of the handwritten text recognition system.4)Unconstrained handwriting input mode,which combines single,overlaid and text line three handwriting input modes together,is proposed in this paper.Its core algorithm is two-level segmentation network.The feasibility of the proposed mode is verified through experiments.We also apply the scheme to SCUT gPen handwriting input method and HuiPen handwriting input method,providing services for more than 80 thousand users.
Keywords/Search Tags:Chinese Text Recognition, Binary Class-irrelevant Geometric Model, Language Model, Unconstrained Handwriting
PDF Full Text Request
Related items