Font Size: a A A

Research On KNN Algorithm Based On Clustering Of Training Set And Its Application

Posted on:2018-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y B HuanFull Text:PDF
GTID:2348330518497613Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
In this paper, much emphasis is laid on K-Nearest-Neighbor algorithm and its application which is a classic classification algorithm.This algorithm needs to calculate the similarity of the sample to be tested with each sample in training set, and similarity is usually expressed in Euclid distance. However, in the background of big data, KNN algorithm will produce a huge amount of computation, which greatly reduces the efficiency of the algorithm. What's more, the choice of k value also has effects on classification results, and many experiments need to be done to determine the best k value. In view of the above shortcomings of KNN algorithm, the major work of this paper are as follows.Firstly, we use the improved LLE algorithm to reduce the dimension of training set, and combined with FCM algorithm on the training set clustering, find k-nearest neighbors from the training samples contained in several clusters that are close to the sample to be tested, so it could eliminate the need to calculate the distance between all the samples of the training set and the testing sample. Besides, a method of giving weights to k nearest neighbor based on the distance has been designed. It could weaken the selection of k value on the result of classification. In the third chapter, the distance weighted KNN algorithm based on training set clustering is proposed by combining KNN algorithm with data dimension reduction, FCM algorithm and weighting method based on distance. Through the simulation data and real data sets experiment,compare KNN and new algorithm, to verify the effectiveness of the new algorithm efficiency is higher.Secondly, the distribution of data is usually nonuniform. For the existence of class skewed in training set, the density weighted KNN algorithm based on training set clustering is proposed in chapter 4. We use the improved LLE algorithm to reduce the dimension of training set,and combined with K-means algorithm on the training set clustering.After finding the k nearest neighbor of the testing sample, we designed a method based on density to measure the weight of k nearest neighbor,which could determine the classification of the testing sample. By experimenting with simulated data and UCI datasets, the new algorithm is compared with the KNN algorithm and the distance weighted KNN algorithm based on training set clustering. It is verified that the new algorithm has higher classification accuracy for class skewed data.Thirdly, we apply the above two improved KNN algorithms to KDD Cup 1999 Data. The accuracy of the improved algorithm is verified to be higher compared with the KNN algorithm. In addition, two improved KNN algorithms are compared and analyzed. The density-weighted KNN algorithm based on training set clustering has a better result in dealing with nonuniform distributed data.Finally, the paper summarizes the work done in the paper and put forward the content can be further studied.
Keywords/Search Tags:Classification algorithm, Data dimension reduction, KNN algorithm, LLE algorithm, FCM algorithm, K-means algorithm, Weighting based on distance, Weighting based on density
PDF Full Text Request
Related items