Optimization Of KNN Classification Algorithm In High Dimensional Data

Posted on:2020-06-30

Degree:Master

Type:Thesis

Country:China

Candidate:Z X Kuang

Full Text:PDF

GTID:2428330596495491

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

During the past decade,data mining has gradually become a hot topic for various researchers.The KNN algorithm has been widely applied in many fields,such as financial analysis and etc.,as its simple structure and easy implementation.However,the traditional KNN algorithm needs to calculate the similarity between sample points unclassified currently and all the training sample points.In order to obtain the top K nearest neighbors,the KNN algorithm has to calculate the categories of the points to be tested.When processing the high-dimensional big data,the computational overhead of the algorithm becomes extremely high.In addition,if the value of K is too large,the nearest neighbor points may contain sample points with lower similarity,which causes the problem of lower accuracy and increasing calculation.And if the value of K is too small,the unclassified sample points will lack of some sample points with higher similarity may cause lower accuracy.In view of the problems above,the research contents of this paper are as follows.Aiming at the problem of redundant calculation in KNN algorithm,a clustering-based Annular K-Nearest Neighbor Algorithm(AKNN)is proposed in this paper.The algorithm is mainly composed of four parts,which are data processing,updating the distance between training points and cluster cores,constructing a annular filter and KNN classification,respectively.When processing the data,the algorithm can select different clustering algorithms to cluster,according to the characteristics of the training sets,and then obtain the center points of the clusters.The crucial step of the algorithm is to construct a ring filter for each test point,and then divide the training points around the current point to be tested into a peripheral training point and an inner training point with the help of the filter.In respect of classification,the algorithm utilizes the inner training points instead of the original training points,so that the algorithm can reduce the overhead on calculation.Finally,the proposed AKNN algorithm is experimentally verified by 9 groups of public data sets.The experimental results show that the proposed AKNN algorithm achieves 3% improvement on average in accuracy,compared with LC-KNN and RC-KNN.Moreover,it also achieves 51% reduction on average in the amount of calculation compared with KNN.For the problem of fixed K value,a K-value Adaptive KNN Algorithm Based on Annular Filter(AAKNN)is proposed in this paper.The algorithm is mainly composed of five parts,which are data processing,updating the distance between training points and cluster cores,constructing a ring filter,constructing sparse vectors and KNN classification,respectively.The core idea is to utilize the sparse vector which can better express the similarity information among data to dynamically select the K nearest neighbors of each test point,which improves the accuracy of the algorithm.Besides,the algorithm can not only select different K values according to the characteristics of different test points,but also avoid excessive memory usage by the ring filter.Finally,the proposed AAKNN algorithm is experimentally verified by 6 groups of public data sets.The experimental results show that the AAKNN and CM-KNN algorithms achieve 2% improvement on average in accuracy compared with other four algorithms(KNN,AKNN,LC-KNN,and RC-KNN).And the AAKNN algorithm can reduce the memory consumption by 79% compared with CM-KNN.

Keywords/Search Tags:

Annular Filter, Clustering, Classification, Triangular Inequality, Sparse Vectors

PDF Full Text Request

Related items

1	Research On Clustering And Classification Based On Support Vector Technique
2	Research Of Text Clustering Based On Self-Organizing Maps
3	Study On The Classification Algorithm Of EEG Signals Based On Filter Bank And Sparse Representation
4	Researches On Clustering-aided Sparse Representation Classification Algorithm
5	PolSAR Image Classification Via Joint Clustering And Sparse Representation
6	Performance Optimization Of Sparse Lower Triangular Solver On Sunway Architecture
7	Research And Application Of Decision Tree Classification Method Based On Mcdiarmid's Inequality
8	Research On Text Classification And Clustering Algorithm Based On Manifold Learning And Sparse Constraints
9	Image Classification And Segmentation Algorithm Based On Clustering
10	Research On The Application Of Sparse Representation In Image Classification Problem