Font Size: a A A

Research On Off-line Handwritten Chinese Character Recognition Based On SVM

Posted on:2010-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:B XiaoFull Text:PDF
GTID:2178360275999978Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Off-line handwritten Chinese character recognition (OHCCR) is one of the most difficult problems in pattern recognition at present, it's also one of main obstacles of handwritten Chinese character informationization. The research on OHCCR play an important role in the auto-processing of the Chinese character information and the intelligence of the computer. Because OHCCR problem belongs to the classification problem of multi-class and complex pattern, high recognition rate need the integration of multi-classifiers through the former research, in the mean time, system spending also increase more. Support Vector Machine (SVM) theory is based on statistical learning theory, it has well academic frame. When SVM resolve problems about small-sample set, non-linear and high dimensions, SVM has especial advantages. It can get a biggish theory meaning and practice value that SVM theory is used for OHCCR.The main parts of this paper are as follows:(1) Conspectus of this paper. It explains the intention and meaning of OHCCR and describes difficulties in this field. This part concludes a general resolving way on OHCCR through the former research on it.(2) The design of off-line handwritten Chinese character auto-input interface. In this part, off-line handwritten Chinese character auto-input interface is designed. This interface is based on TWAIN (Technology Without An Interesting Name) that is a standard communication protocol between computer's application and image acquisition devices. The interface can automatically scan handwritten Chinese character information on the paper into the computer in terms of men's request. Therefore, the interface provides advantages for latter feature extraction and character recognition, and also enhances the whole efficiency.(3) Collecting off-line handwritten Chinese character samples and image pretreatment. Special forms are designed to collect samples from people who are from different kinds of job, sex and degree, and their ages are between 18 and 60. Because the manpower and outlay for research are limited, collected Chinese character classes are the former 50 classes of GB2312 first class character database, each class collected 108 samples. The image pretreatment of each sample consists of gray-processing, binarization, slant correction, form marker orientation, character segmentation, normalizing.(4) Feature extraction. This paper used mesh directional feature to be handwritten Chinese character feature. First, it expatiates four elastic meshing constructing algorithms, and then comparing with each other and analyzing these ways in conquering handwritten Chinese character distortion and algorithm capability. Second, it expatiated three directional decomposition algorithms, and then analyzing everyone's advantages and disadvantages, after this, it concluded that every directional decomposition algorithm adapts different application situation. Under integrated consideration, skeleton feature, edge feature and stroke feature are selected to be three handwritten Chinese character features. These features are made up of elastic mesh based on point density equilibria and three directional decomposition algorithms. This paper ameliorated skeleton feature, the ameliorated skeleton feature inosculated two advantages according to "AND" and "OR" decomposition. After that, this paper qualitative analyzes algorithm complex and character stroke accuracy of the three feature extraction algorithms after comparing with each other. At the end of this chapter, the experiment result shows that features mentioned above can effectively reflect handwritten Chinese character feature.(5) Chinese character recognition based on SVM. This paper applies skeleton feature, edge feature and stroke feature to SVM classifier for the first time. It used SVM method to OHCCR. The method only trained small number character samples and it can get a good generalized recognition machine. The method got a good recognition rate despite that only dozens samples are trained.In this paper a small number of commonly used Chinese characters were studied, and the research target is to explore SVM algorithm for recognition validity of off-line handwritten Chinese characters which is non-special and low limited. Experiments selected the former 50 classes of GB2312 first class character database, each class has 108 samples, 5400 samples in all. Experiments uses LibSVM2.86 toolbox to train and test these samples, as a result, it got a good recognition rate.
Keywords/Search Tags:Chinese Character Recognition, TWAIN, Feature Extraction, SVM
PDF Full Text Request
Related items