Font Size: a A A

Handwritten Postcode And Address Recognition Based On Cost-sensitive Learning

Posted on:2015-02-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:S J LvFull Text:PDF
GTID:1228330467971509Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the past decades, the technology of document image analysis and recognition has achieved great progress in both theory and applications. A number of practical systems have been implemented in many areas. Automatic mail sorting system integrated with postcode and address recognition is one of such typical applications. Many of the classifiers utilized for postcode recognition and address recognition are usually general-purpose commercial recognizers that are trained based on class balanced dataset and do not consider the class imbalance problem existing in real application.However, the distribution of characters in the issue of postcode or address recognition is usually imbalanced. Furthermore, this kind of imbalance problem may cause seriously negative effect on the performance of postcode or address recognition, especially for the handwritten postcode and address. A general-purpose character recognizer devotes to high recognition performance of single characters. However, postcode and address, both consisting of a number of characters, are character strings. The goals of postcode and address recognition are high recognition rate and low error recognition rate of the whole string. In fact, a general-purpose character classifier with superior performance cannot always obtain better performance in postcode or address recognition. At present, the recognition performance of printed postcode and address is very high. However, the recognition performance of handwritten postcode and address, is still low and further improvement is in demand. Based on the above observations, we try to improve the recognition performance of handwritten postcode and address by analyzing and taking into account the class distribution of characters in postcode and address.For single object classification, many researches have proven that cost-sensitive learning is an effective way to resolve class imbalance problem. But there is little study on the class imbalance problem in string recognition. In this dissertation, we concentrate on the class imbalance problem of characters in postcode and address recognition and devote to resolve this class imbalance problem with cost-sensitive learning methods. The contributions are as follows.1. Cost-sensitive classifiers are proposed for handwritten postcode recognition. The problem of postcode recognition is formulated as a class imbalance problem and we resort to cost-sensitive learning techniques. Two popular classifiers, BP neural network and SVM, are chosen for cost-sensitive learning studies. More specifically, four cost-sensitive methods, i.e. cost-sampling, cost-convergence, rate-adapting and output-adapting are presented for the BP neural networks, and two cost-sensitive methods, i.e. cost-sampling and cost-optimization, are proposed for SVM classifier. To find proper costs for each character, several cost matrices are tested in our experiments. The results suggest that cost-sensitive learning is indeed effective on class imbalanced postcode analysis and recognition. It also reveals that cost-sampling on a proper cost matrix outperforms others for both BP and SVM classifiers.2. A cost-sensitive transformation method is proposed for handwritten Chinese address recognition. We propose a cost-sensitive transformation approach for improving the performance of handwritten address recognition by converting a general-purpose handwritten Chinese character recognition engine to a special-purpose one. The class probabilities produced by character recognition engine for predicting a sample to candidate classes are transformed to the expected costs based on Naive Bayes optimal theoretical predictions first. And then candidate probabilities are re-estimated based on the expected costs. The transformed results are used in the address recognition system and several cost matrices are tested to achieve the optimal performance. Experimental results show that cost-sensitive transformation improves the recognition performance of general-purpose recognition engines on handwritten Chinese address recognition. Moreover, the improvement is significant when a proper cost matrix is employed.3. A cost-sensitive MQDF classifier CMQDF is proposed. MQDF has been successfully used in handwritten Chinese character recognition for years. To resolve the class imbalance problem in address recognition, we propose a cost-sensitive learning method for MQDF. In the learning process, a cost matrix is introduced to the discriminative learning process of MQDF, and minimization of misclassification cost is used as the convergence criteria. By cost-sensitive learning, the cost-sensitive MQDF classifier (CMQDF) is obtained. The experimental results prove that CMQDF is an effective cost-sensitive classifier for the class imbalance problem in handwritten Chinese address recognition system. We also compare the performance of CMQDF with cost-sensitive transformation method. The experimental results show that CMQDF enhances the reliability of handwritten address recognition system effectively, while cost-sensitive transformation method improves the recognition rate much more.All the ideas and methods proposed in this dissertation have been verified by experiments and applied in practical systems. More specifically, the cost-sampling method in postcode recognition has been applied in MPS machine which won the first Science and Technology Award issued by China Post. The cost-sensitive transformation method and CMQDF classifier are both used in the national science and technology support program(**Detection and Process Platform) successfully.
Keywords/Search Tags:handwritten postcode recognition, handwritten address recognition, classimbalance problem, cost-sensitive learning, cost matrix, cost-sensitive transformation, CMQDF
PDF Full Text Request
Related items