Font Size: a A A

Research On K Nearest Neighbor Algorithm Based On Class Division And Neighbor Selection

Posted on:2021-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2428330626462884Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In the era of big data,how to mine useful information from massive data has become a focus of public concern.Data mining technology provides an effective way to solve this problem.Data classification is an important method in data mining.Among them,and k nearest neighbor classification algorithm is widely used because of its simplicity and easy operation.But at the same time,the k nearest neighbor algorithm also has problems such as being sensitive to the selection of k values,susceptible to imbalanced data,and too simple to select distance measures.This paper mainly on the basic of the k nearest neighbor classification algorithm and the k nearest neighbor classification algorithm based on local mean,the k nearest neighbor classification algorithm is improved to overcome some defects of the original algorithm by the group intelligent optimization and sparse representation,so as to effectively improve the performance of the algorithm.Specific research contents and results are as follows:1.Aiming at the problem that the basic grasshopper optimization algorithm is easy to fall into the local optimum and the convergence accuracy is not high,an improved grasshopper optimization algorithm is proposed.First,the chaotic reverse learning initialization strategy was used to generate a group of better initial populations,and the natural exponential declining strategy was used to balance the algorithm's ability to explore and develop.Besides,the Gaussian mutation strategy was used to overcome the problem of the algorithm's vulnerability to local optimization.Then,simulation experiments are carried out through 10 benchmark test functions,and the results show that the improved algorithm has higher convergence efficiency and solution accuracy.Finally,in view of the distance weighted k neighbor k neighbor algorithm for distance measurement is too high,the dependence of the generated weights has some problems such as randomness,proposed an optimization algorithm based on improved grasshopper distance weighted k nearest neighbor algorithm,using the optimization algorithm and distance measure next neighbour to generate a set of optimal weights,the algorithm of the weighted voting process.The test is performed on 6 data sets in the UCI database.The results show that the algorithm is not affected by the change of k value,and the classification accuracy is improved.2.In the k nearest neighbor algorithm,each attribute has the same impact on the classification process,which leads to some features with weak correlation that will cause classification errors in the new data.In addition,when the k nearest neighbor algorithm is faced with unbalanced data sets and outliers,the traditional majority voting principle will have different degrees of misclassification.For these problems,a k nearest neighbor algorithm based on mutual information and local mean is proposed.Firstly,the correlation degree of mutual information is used to weight the attributes.Secondly,a comprehensive classification strategy is established based on the local mean and class contribution.Finally,five data sets in UCI database were used to verify the performance of the proposed algorithm through ten-fold cross validation method,and the experimental results verified that the algorithm has high accuracy and stability in different data sets.3.The multi-local means-based k-harmonic nearest neighbor assigns the same weight to all attributes,thus ignoring the difference in the contribution rate of different attributes.In addition,the algorithm only selects the neighbor samples according to the distance sort,and does not fully consider the neighborhood distribution of the samples.To solve these problems,a harmonic neighbor algorithm based on attribute weight and sparse coefficient is proposed.Firstly,the distance formula is weighted by defining the comprehensive attribute weight by mutual information and gain rate.Secondly,a two-step neighbor selection strategy is established to select neighbor samples based on the strong pattern recognition ability of sparse coefficient.Finally,12 standard data sets and 2 noisy data sets from the UCI and KEEL databases were used to test the algorithm and compared with 6 classical algorithms.The results show that the improved algorithm achieves a high classification accuracy on the basis of better robustness.
Keywords/Search Tags:k nearest neighbor algorithm, Grasshopper optimization algorithm, Mutual information, Local mean, Sparse coefficient
PDF Full Text Request
Related items