Font Size: a A A

Recognition Of Off-line Handwritten Chinese Character By Using Decision Tree Based On Hiberarchy Decomposition

Posted on:2010-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:D LiuFull Text:PDF
GTID:2178360275499913Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Off-line handwritten Chinese character recognition (HCCR) has become a challenging issue in pattern recognition area because of its large character set, different font styles and other features. It plays an important role in letter sorting, bank check recognition, statistical reports processing, as well as dealing with automatic input of handwritten manuscripts and many other aspects. The research on off-line handwritten Chinese character recognition is quite significant for the automatic processing of Chinese character information and the development of intelligent input of the new generation computer. Handwritten Chinese character recognition is a very complex pattern recognition problem, many years of research shows that the effect of a single method are limited, various methods have their own advantages and limitations. Multi-feature fusion and integration of multiple schemes are considered to be a trend for the development of handwritten Chinese character recognition with the use of information fusion technology and the organic combination of multiple methods.The research object of this paper is a few Chinese characters, and the purpose is to explore the effective recognition algorithm on non-specific limited off-line handwritten Chinese character. This experiment selected 50 font types of Chinese characters which are commonly used from GB2312-80 database. Collecting 100 samples for each type, and the total number of samples is 5000, 80% of the data for training, 20% for testing.The main contents of this paper and research results are as follows:(1) After analyzing the latest developments in Chinese character recognition technology, I designed a multi-classifier integrated strategy for Chinese character recognition, that is, two serial integration model classifier. Rough classification used the improved ID3 algorithm - the decision tree based on hierarchical decomposition. The algorithm is simple, and does not need the relevant domain knowledge. Its classification speed is fast and calculation amount is small. In a word, it provides useful help to resolve the multiclass classification problem. This paper explored the feasibility of the algorithm in off-line handwritten Chinese character recognition, and made appropriate adjustments in data processing, the selection of the threshold of overlap degree, discretization of continuous attributes consulting C4.5 algorithm. At last, I realized a program by using C + + Builder programming tools for the system design model. The experimental results show that the model is valid.(2) In order to improve the efficiency and quality of sample collection, and get ready for further large-scale collection, I designed a special collection form with a location mark. Using the form can not only meet the requirement of recognition but also simplify some of preprocessing steps and improve the efficiency of the preprocessing.(3) The sample database is saved in a binary format. And I defined a structure Data as the unit of the database. Such a kind of saving format can not only save storage space, but also accelerate the speed of reading and writing processing on the sample database.(4) In rough classification, I extracted the stroke-crossing feature and directional decomposition feature based on elastic mesh, got the attribute set which contains 14 attributes for decision tree classification. At last, I analyzed and compared different selection of the test attribute, provided a reference for further study. In fine recognition, I extracted the peripheral feature as the complement of the internal statistical feature used in rough classification. Two kinds of features can reflect the internal and external structures of Chinese characters completely.(5)I selected candidates in rough classification results for the fine recognition which using the peripheral feature as well as distance classifier. This design strategy reduces the candidate sets and improves the speed of recognition.
Keywords/Search Tags:sample collection form, elastic mesh, the direction decomposition of Chinese strokes, hierarchical decomposition, the decision tree
PDF Full Text Request
Related items