Font Size: a A A

Research On Several Pattern Classification Methods Based On K-nearest Neighbor Criterion

Posted on:2019-09-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:H X MaFull Text:PDF
GTID:1368330548463966Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The K-Nearest Neighbor(KNN)is widely used in pattern recognition,machine learning,and data mining because it is intuitive,simple,effective,and easy to implement.It is also selected as the top 10 most influential data mining algorithms in the 2005 ECDM International Conference.At present,the research on the k-nearest neighbor mainly focuses on improving the classification accuracy of the samples to be tested in the neighborhood classification,reducing the complexity of the classification search algorithm,and the selection of neighbors.The paper mainly focuses on the k-nearest neighbor criteria in the classification.Firstly,a pseudo nearest centroid neighbor classification algorithm(PNCN)is proposed to find out the pseudo nearest centroid neighbors of the tested samples,so as to solve the problem that classification on small sample data sets is easily affected by outliers.Second,there is also proposes a harmonic mean distance-based k-nearest neighbor classification algorithm(HMDKNN),and this algorithm calculates the multiple harmonic mean distances of the tested samples to solve the sensitivity of the classification algorithm to the k value.Finally,by calculating the sparse representation coefficients of the training samples to automatically selected k representative nearest neighbors,a coefficient-weighted k-nearest neighbor algorithm(CWKNN)and a residual-weighted k-nearest neighbor algorithm(RWKNN)are proposed to overcome the limitations of the traditional use of simple Euclidean distance to find neighbors and improve classification performance.The main research work and innovations of the paper are as follows:1.A pseudo nearest centroid neighbor classification algorithm(PNCN)was proposed to solve the problem that KNN is vulnerable to outliers in a small sample data set.Since PNCN uses the k local mean points corresponding to k nearest neighbor proximity points of each class in the training sample to calculate the pseudo near centroid neighbor points,the similarity and spatial distribution of the nearest neighbors of the test sample are considered,and the classification accuracy is higher than the other similar algorithms.The experimental results show that the PNCN classification algorithm has a higher classification accuracy rate than the other similar classification algorithms,no matter whether it is in a real data set or a noisy small sample data set,and it has good robustness to neighboring k-values.2.In order to overcome the impact of classification accuracy and improve the performance of neighbor classification,a harmonic mean distance-based k-nearest neighbor classification algorithm(HMDKNN)is proposed based on local mean vectors and multiple harmonic mean distances of k nearest neighbors in each class.HMDKNN first computes the k nearest neighbors' local mean vectors in each class of datasets,then calculates the nested harmonic mean distances for k local mean vectors in each class,and finally,the test sample is divided into the categories corresponding to the minimum nested harmonic mean distance values.Due to the use of multiple local average vectors,multiple harmonic average distances and nested harmonic mean distances for each class,HMDKNN further reduces the sensitivity to k values and improves classification accuracy compared to other similar algorithms.And when the k value changes,it has.a good robustness,whether it is in UCI and KEEL real data sets,or artificial data sets,noise data sets,and higher-dimensional time series data sets.3.Since the sparse coefficients can represent the similarity between data and potentially discriminatory information,the k representative nearest neighbors can be selected by calculating the sparse representation coefficients of all the training samples of the test sample.The paper proposes two weighted nearest neighbor classification algorithms based on sparse coefficients:coefficient weighted k-nearest neighbor classification algorithm(CWKNN)and residual-weighted k-nearest neighbor classification algorithm(RWKNN).In CWKNN,the k nearest neighbors of the sample to be measured are selected by sparse coefficients,and each neighbor's sparse representation coefficient is used as its weight for weighted majority voting in KNN classification decision.In RWKNN,the k nearest neighbors of the sample to be measured are selected by sparse coefficients,and then the reconstruction residuals between the k nearest neighbors and the sample to be measured are calculated.Finally,the weighted vote of the reconstruction residuals are used to classify the samples to be measured.Experimental results show that compared with other similar classification algorithms,CWKNN and RWKNN have better classification performance in real data sets,artificial data sets,and noise data sets,and their classification has good robustness when the k value changes.
Keywords/Search Tags:nearest neighbor classification, k-nearest neighbors, pseudo nearest centroid neighbors, harmonic mean distance, sparse coefficients
PDF Full Text Request
Related items