Font Size: a A A

Study On Generalized Nearest Neighbor Pattern Classification

Posted on:2010-02-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y CengFull Text:PDF
GTID:1118360305956606Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Pattern Recognition, with its explicit definition of the problem, the strict mathematical foundation, the solid theoretical frame and the widespread application value, has been received more and more attention, as well as becomes one of central research contents in several other disciplines, such as artificial intelligence, computer vision, machine learning, psychological biology and cognitive science and so on. In pattern recognition domain, in situations where the probability density distributions form is unknown and the probability density form conforms very little to the actual situation, the non-parametric technique is an effective classification method. As a classic non-parametric classification method, with its simplicity and high classification accuracy, the nearest neighbor classification is widely applied in information retrieval, time series prediction, handwriting and gesture recognition, geological analysis, character recognition, computer-aided diagnosis and many other fields. Theory has proven that the nearest neighbor asymptotic or infinite sample size error is less than twice the optimal Bayes error.Nearest neighbor classification is widely applied in pattern recognition domain, however, there are some problems for nearest neighbor classification: (1) Nearest neighbor classification generally achieves good results when the available number of prototypes is very large, relative to the intrinsic dimensionality of the data involved. However, in most real situations, the number of available prototypes is usually very small, which often leads to dramatic degradations of nearest neighbor classification accuracy. (2) Nearest neighbor classification makes its prediction based on local information. Because the classifications are made locally, nearest neighbor classification is quite susceptible to noise. As one of the nonparametric classification methods, the performance of nearest neighbor classification is severely influenced by the outliers. (3) Nearest neighbor classification can produce arbitrarily shaped decision boundaries and the decision boundaries of nearest neighbor classifiers also have high variability. The decision boundaries of nearest neighbor classifiers are directly affected by the choice of the number of the nearest neighbors. (4) Nearest neighbor classification is one instance-based lazy learning method; it requires large memory space and expensive computation. Nearest neighbor classification has the high computational complexity. (5) Nearest neighbor classifiers can produce wrong predictions unless the appropriate proximity measure is taken.For the problems exsiting in the nearest neighbor classification, this dissertation focuses on three major problems which are related to the nearest neighbor classifiers, i.e. how to improve nearest neighbor classification accuracy, how to choose a suitable distance measure and how to reduce computational complexity of the nearest neighbor algorithm in order to reduce storage requirements, speed up the search speed. For those problems, we will conduct in-depth and meticulous study and exploration.Specifically, the main contributions of this dissertation are as follows:1. Pseudo nearest neighbor classification rule based on similarities between the within-class samples is proposed. Pattern classification can be executed successfully lies in that whose feature values are very similar for objects in the same category, and very different for objects in different categories. Pseudo nearest neighbor classification method makes the best of the essence of pattern classification. In pseudo nearest neighbor classification rule, the influence exerted by different distances between the test sample and its nearest neighbors in the same category is considered, and the smaller the distance, the greater the weight of the corresponding nearest neighbor, and the greater the influence exerted on classification result. Theoretical analysis has proven that the asymptotic classification probability of error of Pseudo nearest neighbor classification rule can be less than or equal to the asymptotic classification probability of error of traditional nearest neighbor classification method. Pseudo nearest neighbor classification rule can improve the classification performance especially in the large sample case.2. Aimed at the influence of the outliers in the training sample set exerted on the classification performance, the extended nearest neighbor classification based on class mean is proposed. The non-parametric classification especially the nearest neighbor classification, its classification accuracy will degrade dramatically due to the existing of the outliers. Because of introducing of class mean vector which is a compact description of the ensemble knowledge of the sample data, the extended nearest neighbor classification based on class mean can overcome the influence of the existing outliers and avoid overfitting to the training data, and so improve the classification performance of the nearest neighbor classification. For the proposed method, improvement to classification performance is more obvious in the case that the differences in these class-mean vectors of data are great.3. For the nearest neighbor classification method, its performance degrades dramatically in a small sample case. To improve classification performance of the nearest neighbor classification, a extended nearest neighbor classification method based local mean and class statistics is proposed. The classification method not only can overcome the influence of the existing outliers and but also can avoid overfitting to the training data, and obviously improve the classification performance of the nearest neighbor classification especially in a small sample case.4. The major challenge imposing on the wide applications of the nearest neighbor classification method is its computational complexity. To reduce the computational complexity, nearest neighbor editing based on semi-supervised learning is proposed. The proposed editing method not only eliminates the samples with erroneous labels but also cleans possible overlapping among classes, which reduces the number of samples in the training sample set. Most importantly, the proposed editing method can significantly improve the classification performance of the nearest neighbor classifier.
Keywords/Search Tags:Nearest neighbor classification, pseudo nearest neighbor, local mean, extended nearest neighbor, distance measure, nearest neighbor editing
PDF Full Text Request
Related items