Font Size: a A A

Research On Dimensionality Reduction And Handwritten Character Recognition

Posted on:2015-06-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:C YaoFull Text:PDF
GTID:1108330464468879Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The dissertation studies two key tasks, dimension reduction and handwriting character recognition, in pattern recognition. For dimension reduction problem, the thesis focus on Linear Discriminant Analysis(LDA), which is one of the most popular dimension reduction methods. The relation between LDA and classification accuracy is first discussed. For the class separation problem of LDA, a subset improving method is proposed. A novel constraint for the discriminative vectors is also introduced based on the separable status of the obtained vectors to improve the performance of LDA. For handwriting character recognition task, a method for selecting the truncate coefficient h of Modified Quadratic Discriminant Function(MQDF) is proposed. To solve the similar handwritten character recognition problem, a new scheme is presented to make better use of the feature’s discriminative information.The main content of this dissertation is summarized as follows. 1. The relation between LDA and Bayes error rate is studied firstly. For two Gaussian distributions, the functional relations between LDA and Bayes are presented for homoscedastic case and the special heteroscedastic case,. For multi-class situation, the functions of the projecting vector for LDA and Bayes error are deduced, individually. The difference between the two functions gives an intuitive insight to the non-optimal problem of LDA.2. For the non-optimal problem of LDA, which occurs for C-class when the reduced dimensionality is less than C- 1, a subset improving method is proposed. The proposed method is based on the knowledge that LDA could be more efficient and robust handling a series of smaller C’-class problems instead of the C-class problem, where C’ < C. In the method, the subspaces are found for each subset rather than that for the entire data set. To partition the entire data set into subsets, a cost matrix is first estimated from the training set with pre-learned classifier, then graph cut method is adopted to minimize the cost between each subset. Actually, the proposed method could also adopted to improve other linear discriminant method. Experimental results based on different applications demonstrate both the generality and effectiveness of the proposed method.3. A new constraint is proposed based on the separable status of the obtained vectors forthe Fisher vectors. The separable relation between the feature vector and its element is first investigated, which gives that the multi-dimensional feature with homoscedastic Gaussian distribution must be separable if any of its elements is separable. Then, a between-scatter matrix updating scheme is presented. To make the discriminant vectors statistically uncorrelated, the algorithm is applied to the St-orthogonal space of the obtained vectors in an iterative way. The method is extended to more general cases, like heteroscedastic distributions, by an appropriate kernel function. Experimental results on multiple databases demonstrate the effectiveness of the proposed method.4. A method for selecting the truncate coefficient h of Modified Quadratic Discriminant Function(MQDF) is presented. Based on the theoretical analysis, the impact of h to MQDF could be divided into two groups. The distribution for each group is first learned from the training set, then the best choice of h between the trade-off is given. The non-parametric method is used to model the probability density of each distribution, so that the best value of h can be selected. The experimental results on the handwritten digital character dataset MNIST and handwritten Chinese character dataset ETL9 B show the effectiveness of the proposed method.5. To solve the similar handwritten character recognition problem, a novel scheme is introduced to make better use of the feature’s discriminative information in this dissertation. Different from the methods that extracting extra feature for the similar characters, Modified Quadratic Discriminant Function(MQDF) is first adopted to classify the feature, then Support Vector Machine(SVM) is used to discriminate the similar characters without extra feature. To collect the subset of similar characters, the confusion matrix is employed. A new structure to storage the dictionary of SVM is also proposed for quickly searching. The experimental results on ETL9 B show the superior performance of the proposed scheme to the methods that extracting extra feature, which prove the feature contains discriminative information for the similar characters and the proposed scheme can utilize these information much effectively.
Keywords/Search Tags:Linear Discriminant Analysis, Optimal subset division, Constraint for Fisher vectors, Coefficient selection, Cascade classifier
PDF Full Text Request
Related items