Research On KNN Algorithm Based On Clustering Of Training Set And Its Application

Posted on:2018-07-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y B Huan

Full Text:PDF

GTID:2348330518497613

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

In this paper, much emphasis is laid on K-Nearest-Neighbor algorithm and its application which is a classic classification algorithm.This algorithm needs to calculate the similarity of the sample to be tested with each sample in training set, and similarity is usually expressed in Euclid distance. However, in the background of big data, KNN algorithm will produce a huge amount of computation, which greatly reduces the efficiency of the algorithm. What's more, the choice of k value also has effects on classification results, and many experiments need to be done to determine the best k value. In view of the above shortcomings of KNN algorithm, the major work of this paper are as follows.Firstly, we use the improved LLE algorithm to reduce the dimension of training set, and combined with FCM algorithm on the training set clustering, find k-nearest neighbors from the training samples contained in several clusters that are close to the sample to be tested, so it could eliminate the need to calculate the distance between all the samples of the training set and the testing sample. Besides, a method of giving weights to k nearest neighbor based on the distance has been designed. It could weaken the selection of k value on the result of classification. In the third chapter, the distance weighted KNN algorithm based on training set clustering is proposed by combining KNN algorithm with data dimension reduction, FCM algorithm and weighting method based on distance. Through the simulation data and real data sets experiment,compare KNN and new algorithm, to verify the effectiveness of the new algorithm efficiency is higher.Secondly, the distribution of data is usually nonuniform. For the existence of class skewed in training set, the density weighted KNN algorithm based on training set clustering is proposed in chapter 4. We use the improved LLE algorithm to reduce the dimension of training set,and combined with K-means algorithm on the training set clustering.After finding the k nearest neighbor of the testing sample, we designed a method based on density to measure the weight of k nearest neighbor,which could determine the classification of the testing sample. By experimenting with simulated data and UCI datasets, the new algorithm is compared with the KNN algorithm and the distance weighted KNN algorithm based on training set clustering. It is verified that the new algorithm has higher classification accuracy for class skewed data.Thirdly, we apply the above two improved KNN algorithms to KDD Cup 1999 Data. The accuracy of the improved algorithm is verified to be higher compared with the KNN algorithm. In addition, two improved KNN algorithms are compared and analyzed. The density-weighted KNN algorithm based on training set clustering has a better result in dealing with nonuniform distributed data.Finally, the paper summarizes the work done in the paper and put forward the content can be further studied.

Keywords/Search Tags:

Classification algorithm, Data dimension reduction, KNN algorithm, LLE algorithm, FCM algorithm, K-means algorithm, Weighting based on distance, Weighting based on density

PDF Full Text Request

Related items

1	Study On Naive Bayesean Algorithm Based On Attributes Weighting And Reduction
2	Research On Localization Algorithm Based On Improved DV-Hop And Node Density Weighting
3	Research On Text Classification Based On Feature Selection And Feature Weighting Algorithm
4	Research On Intelligent Recommendation Algorithm Based On Clustering
5	The Research Of Na(?)ve Bayes Classification Algorithm Based On Atrribute Reduction And Attribute Weighting
6	Research Of Hybrid Clustering Algorithm For Incomplete Data Based On Local Weighting
7	Feature Weighting And Distance Metric Learning For Multiple-Instance Classification
8	The Improvement Of CSMA/CA Algorithm Based On Dynamic Weighting Algorithm
9	Research On The Improvement Of C-means Clustering Algorithm
10	Research And Application Of K-means Algorithm Based On Density And Distance