Font Size: a A A

Optimization Of KNN Classification Algorithm In High Dimensional Data

Posted on:2020-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z X KuangFull Text:PDF
GTID:2428330596495491Subject:Software engineering
Abstract/Summary:PDF Full Text Request
During the past decade,data mining has gradually become a hot topic for various researchers.The KNN algorithm has been widely applied in many fields,such as financial analysis and etc.,as its simple structure and easy implementation.However,the traditional KNN algorithm needs to calculate the similarity between sample points unclassified currently and all the training sample points.In order to obtain the top K nearest neighbors,the KNN algorithm has to calculate the categories of the points to be tested.When processing the high-dimensional big data,the computational overhead of the algorithm becomes extremely high.In addition,if the value of K is too large,the nearest neighbor points may contain sample points with lower similarity,which causes the problem of lower accuracy and increasing calculation.And if the value of K is too small,the unclassified sample points will lack of some sample points with higher similarity may cause lower accuracy.In view of the problems above,the research contents of this paper are as follows.Aiming at the problem of redundant calculation in KNN algorithm,a clustering-based Annular K-Nearest Neighbor Algorithm(AKNN)is proposed in this paper.The algorithm is mainly composed of four parts,which are data processing,updating the distance between training points and cluster cores,constructing a annular filter and KNN classification,respectively.When processing the data,the algorithm can select different clustering algorithms to cluster,according to the characteristics of the training sets,and then obtain the center points of the clusters.The crucial step of the algorithm is to construct a ring filter for each test point,and then divide the training points around the current point to be tested into a peripheral training point and an inner training point with the help of the filter.In respect of classification,the algorithm utilizes the inner training points instead of the original training points,so that the algorithm can reduce the overhead on calculation.Finally,the proposed AKNN algorithm is experimentally verified by 9 groups of public data sets.The experimental results show that the proposed AKNN algorithm achieves 3% improvement on average in accuracy,compared with LC-KNN and RC-KNN.Moreover,it also achieves 51% reduction on average in the amount of calculation compared with KNN.For the problem of fixed K value,a K-value Adaptive KNN Algorithm Based on Annular Filter(AAKNN)is proposed in this paper.The algorithm is mainly composed of five parts,which are data processing,updating the distance between training points and cluster cores,constructing a ring filter,constructing sparse vectors and KNN classification,respectively.The core idea is to utilize the sparse vector which can better express the similarity information among data to dynamically select the K nearest neighbors of each test point,which improves the accuracy of the algorithm.Besides,the algorithm can not only select different K values according to the characteristics of different test points,but also avoid excessive memory usage by the ring filter.Finally,the proposed AAKNN algorithm is experimentally verified by 6 groups of public data sets.The experimental results show that the AAKNN and CM-KNN algorithms achieve 2% improvement on average in accuracy compared with other four algorithms(KNN,AKNN,LC-KNN,and RC-KNN).And the AAKNN algorithm can reduce the memory consumption by 79% compared with CM-KNN.
Keywords/Search Tags:Annular Filter, Clustering, Classification, Triangular Inequality, Sparse Vectors
PDF Full Text Request
Related items