Font Size: a A A

Research On Segmentation And Recognition Of Unconstrained Handwritten Numeral Strings

Posted on:2011-03-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:J DingFull Text:PDF
GTID:1118330335986477Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Character recognition is a very important branch in pattern recognition. As an important aspect of Optical Character Recogntion (OCR), recognition of handwritten has been widerly applied in certain fields such as mail address reading and automatic processing of bankchecks. With the development of technology, the object of handwritten numeral recognition has become unconstrained and continuous numeral strings. Most existing off-line recognition systems adopt single-character-based recognition methods. Hence segmentation of characters is an important and key step in the practical handwritten character recognition systems. However, technologies of segmentation encounter some difficulties due to the characteristic of handwritten characters such as the distortion and touching of characters. Resent researches show that string segmentation has become the core to improve the performance of the whole system and it is very significant to study on how to separate characters correctly before they are sent to the recognition engine.A new structual method of segmentation for the unconstrained handwritten numeral strings is presented. The segmentation process is completed by a stroke grouping scheme. Principal curve analysis is a new feature extraction method based on nonlinear transformation. They are smooth self-consistent curves that pass through the "middle" of the distribution. A stable description of the numeral strokes can be derived by the principal curves. Stroke sets are extracted by the principal curve analysis and a segmentation scheme based on the recognition confidence is applied. Experiment results on a set of the local Chinese bankchecks indicate that the method is effective in the segmentation of numeral strings.Fuzzy rules are introduced for the redundancy of the initial stroke sets and the complexity of the segmentation algorithm. Fuzzy features of all the strokes are extracted and three types of disposal of strokes are applied:(1) split of the abnormal strokes, (2) combination of stroke fragments and (3) deletion of redundant strokes. Besides, the amount of the strokes are dramatically decreased which can do great help to the complexity of the segmentation algorithm.Many present segmentation schemes have been proposed on the unconstrained handwritten numeral strings and can be classified into three categories:projection and contour based scheme, structure based scheme and recognition based scheme. The projection and contour based scheme can not find the correct segmentation points which exit in the seriously distorted or smooth intervals of touching contours. In the structure based scheme, the performance is always instable and unable to distinguish single digits from non-digit patterns (outliers). With an imbedded classifier, the optimal segmentation result can be obtained more efficiently. A classifier based on the Affinity Propagation (AP) and biomimetic pattern recognition is proposed. It can relatively classify the samples by calculating the distance to the relative subspace. The training sample space is constructed by the AP algorithm and bionic pattern recognition theory. The posterior probabilities based on the class condition are estimated to reduce the reject rate caused by the space overlapping with low misclassification. Experiments have been perfonmed with Concordia University CENPARMI's handwritten digit database and Nanjing University of Science and Technology's handwritten amount database. Experimental results indicate that the proposed classifier has a higher recognition rate than the traditional classifiers.The recognition confidence plays an important role in the segmentation scheme of the numeral strings. In order to solve the problems of in-and over-segmetation, all the segmentation candidates are described in a probabilistic model. With an imbedded classifier, the optimal segmentation result can be obtained according to the maximum a posterior (MAP) criterion.
Keywords/Search Tags:Principal Curve Analysis, Fuzzy Feature, Posterior Possibility, Confidence Transformation, Probabilistic Model, Stroke Grouping, Affinity Propagation Clustering, Bionic Pattern Recognition, Segmentation of Numeral Strings, OCR
PDF Full Text Request
Related items