Font Size: a A A

Key Technologies For Recognition Of On-line Handwritten Uyghur Characters And Words

Posted on:2014-06-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y R Y B L Y MaFull Text:PDF
GTID:1268330398454826Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
On-line Handwritten Character Recognition is one of challenging important topic in the field of pattern recognition, but also a comprehensive technology. In recent years, with the increasing use of mobile devices such as cell phone, tablet computers and digital pen, on-line handwriting input as a natural, convenient has been attached great importance to and has been widely used in daily life. Uyghur language is an official language and very popular in the Xinjiang province of China. However, the research about the technology for Uyghur handwritten recognition is lagging much behind and little work has been done in this area. The research of recognition techniques for online handwritten Uyghur characters are not only has great reference values for starting the research of other ethnic group’s handwritten scripture recognition, but also has a far-reaching meaning about developing the information technology and national culture of specific ethnic group.Uyghur words are formed by concatenation of the characters, which has a very special written structure different from Chinese and English characters. It is written from right to left; every letter may have different shapes in different positions. All these characteristics bring many difficulties to recognition. In this paper, through in-depth study the research trend of on-line handwriting recognition technology in domestic and abroad, based on analysis of the unique shape and writing styles of Uyghur characters, proposes an approach for online handwritten Uyghur character and word recognition. The major contributions of this dissertation are as follows:1. A handwritten Uyghur character database and a handwritten word database are established for the first time respectively. In order to support the research of Uyghur handwritten recognition, we collected the samples of online Uyghur handwriting. The datasets of Uyghur characters contain78,336samples of128classes (including four different types of32characters set), are handwritten by612volunteers includes students and teachers. The datasets of Uyghur words contain584,000samples of commonly used1460words, individually are handwritten by400volunteers also includes students and teachers. The database can be used for typical research tasks of handwritten document analysis such as handwritten recognition, handwritten document retrieval and writer identification etc.2. An online handwritten Uyghur characters recognition framework have been presented. We evaluate various techniques of normalization, feature extraction and classification that have been successfully applied in handwritten Chinese character recognition. Specifically, we use eight normalization techniques such as liner normalization (LN), moment normalization (MN), bi-moment normalization (BMN), Centroid-boudary alignment (CBA) and several corresponding pseudo2D normalization methods. We use the normalization cooperated feature extraction (NCFE) method with different settings. For classification, we use four classifiers, namely, the modified quadratic discriminant function (MQDF), the discriminative learning quadratic discriminant function (DLQDF), the learning vector quantization (LVQ) classifier, and the support vector classifier with RBF kernel (SVC-rbf). Furthermore, the geometric features which characterizing the spatial context in handwritten documents are extracted for enhance the recognition performance. In experiments on38,400test samples of128classes, the proposed approach achieved an accuracy of89.08%.3. we designed a framework for online handwritten Uyghur characters recognition system based on DTW and carried out a more systematic theoretical and experimental research on its’module, such as pre-processing, feature extraction, Cluster analysis and classifier. In the pre-processing, in order to obtain the structure information of characters, according to handwritten Uyghur character’s feature, we use linear normalization and nonlinear normalization based on dot density method. Taking into account the more similar characters in the Uyghur language, use the feature extraction method of combined with the structural features and statistical features, such as uniform sampling feature, direction feature, grid direction density feature, two directional projection feature. Cluster analysis use the dynamic clustering algorithm based on the minimum spanning tree (MST), and classifier use the nearest neighbor matching classification. The experimental testing has been carried out and the results show that over-all recognition rate for four different character shapes is respectively74.67%,70.42%,63.33%,72.02%; the recognition rate for the handwritten characters which are recognized as in one of the two candidate characters are respectively86.85%,86.09%,80.43%,88.41%, and one of the five candidate characters are94.34%,94.19%,93.15%,95.86%respectively.4. And an online handwritten Uyghur characters recognition method based on the integration of multiple classifiers have been presented. Combination of multiple classifiers, a certain extent, compensate for defects of a single classifier, so it has been widely applied in pattern recognition. In our research, we applies five different feature extraction methods to construct five separate classifier and using voting strategy of ranging from rights to effective combined five kinds of classifier. Each classifier use the nearest neighbor classification method based on dynamic time bending matching distance. Experimental results show that the recognition rate based on integration strategy is significantly higher than the recognition rate of separate classifier, and it also provide a variety of effective ways for the comprehensive integration of features.5. To be enabled to separate the many connected characters in cursive Uyghur handwriting, we present a novel character segmentation method using dynamic programming. Firstly, after removing delayed strokes from the handwritten words, potential breakpoints are detected from concavities and ligatures by temporal and shape analysis of the stroke trajectory. Reconstruct delayed strokes and obtained a sequence of primitive segments. Then, by merging the neighboring blocks, create candidate segmentation paths. Then paths were evaluated by the character recognition and geometric information, and a dynamic programming method is applied to find the best segmentation point for each character. Our preliminary experiments on an online Uyghur word dataset demonstrate that the proposed method can achieve good performance in segmenting cursive handwritten Uyghur characters.6. On the issues of characters segmentation, we adopted the two level segmentation scheme in which the word segmented into conjoined section firstly, and then the conjoined section cut into characters in the next steps. We put forward conjoined section segmentation algorithm and characters segmentation algorithm for conjoined section segmentation problem includes characters segmentation problem.7. The online handwritten Uyghur word recognition approaches based on a lexicon-driven, integrated segmentation and recognition have been presented. Word recognition problem is transformed into matching optimization problems between the dictionary entry and the handwritten word image. There are many connected characters in cursive Uyghur writing, which makes the segmentation and recognition of Uyghur words very difficult. The solution is using of integrating the segmentation and recognition method to obtain the optimal segmentation and recognition results came from combined search. The first step, using of the over-segmentation algorithm to word separation, formed the segmentation candidate grids by combining adjacent fragments. In the second step, using lexicon-driven approach, combined with character recognition information, geometric information and dictionary information into path matching procedure in the word recognition system. Our preliminary experiments on an online Uyghur word dataset demonstrate that the proposed method can give high recall rate of segmentation point detection. Then using the confidence transformation method convert the similarity scores into probabilities, such that the tuning of weighting parameters becomes easier. Dynamic matching between characters in the lexicon entry and segment(s) of the input word image is used to ranking the lexicon entries in order to get best match. As the result the performance for lexicons of size100,500,1000,10000are84%、78%、68%and 47%respectively.
Keywords/Search Tags:Uyghur Scripts, Online Handwritten Characters, Online Handwritten Words, Recognition, Characters Segmentation, information fusion of segmentation and recognition
PDF Full Text Request
Related items