Font Size: a A A

Research And Application Of Fast KNN Algorithm With Low Time Complexity

Posted on:2020-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y YuFull Text:PDF
GTID:2518306305997949Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
KNN algorithm is a very classical classification algorithm,which mainly solves the classification problem of unknown attribute samples.In the process of classification,it is necessary to calculate the distance between the unknown attribute samples and each known attribute samples.Therefore,in the application of massive high-dimensional data,the traditional KNN algorithm has a serious challenge of high computational complexity.This paper studies this shortcoming.Distance-based outlier detection algorithm is one of the commonly used outlier detection algorithms,but it has a high computational complexity in the face of massive high-dimensional data.This paper also studies this shortcoming.The main work of this paper is as follows:Firstly,a fast KNN algorithm with low time complexity is proposed to overcome the low classification efficiency of traditional KNN algorithm in the face of massive high-dimensional data.This algorithm is different from the traditional KNN algorithm in the following aspects:first,select some samples randomly from the unknown attribute samples set for classification.And then,select a random sample from the rest of the unknown attributes,calculate its distance from all the classified samples and find the nearest classified sample.Finnally,a hypersphere is formed by taking the nearest classified sample as the center of the circle,and taking 2 times of the distance between them and plus the distance of the farthest sample in the k nearest neighbor of the nearest classified sample is the radius.Out of hypersphere Samples do not need to calculate the distance with the sample to be classified,thus reducing the amount of calculation and achieving the effect of reducing the computational complexity.Secondly,a fast KNN algorithm based on center of gravity is obtained by combining the KNN algorithm based on center of gravity with the fast KNN algorithm with low time complexity.The algorithm steps are basically consistent with the fast KNN algorithm with low time complexity.The difference is that the k value of the algorithm is all equal to 1,and the mean value is used as the class center of gravity of each class.The sample to be classified only needs to calculate the distance with the class center of gravity.The other steps are consistent with the fast KNN algorithm with low time complexity.Experimental results show that the proposed algorithm is more efficient than the traditional algorithm.Thirdly,the traditional distance-based outlier detection algorithm is improved,and a fast distance-based outlier detection algorithm is proposed.The traditional algorithm needs to calculate the distance between the sample to be tested and all samples.The proposed algorithm starts with the second sample to be tested(the first arbitrarily selected sample,which is consistent with the traditional algorithm)and does not need to compute the distance with all other samples.A hollow hypersphere is established when the distance between the sample to be measured and the center of the circle is greater than the distance between the sample to be measured and the center of the circle and the sample in the hypersphere with the radius of the neighborhood given in advance.The radius of the hollow part is the distance between the sample to be measured and the center of the circle minus the given neighborhood radius.The supersphere and hollow part samples do not need to calculate the distance from the samples to be tested,so as to reduce the computational complexity of the algorithm.Experimental results show that the detection efficiency of the proposed algorithm is higher than that of the traditional algorithm.Fourthly,a fast KNN outlier detection algorithm with low time complexity is proposed based on the idea of fast KNN algorithm with low time complexity.The difference between this algorithm and the traditional distance-based outlier data detection algorithm is as follows:the traditional algorithm is to compare the number of neighbors of the detected sample with the given neighborhood parameter ? under the condition that the neighborhood radius of the detected sample is all d.The proposed algorithm is to compare the distance between the preset neighborhood radius d and the farthest sample in the neighborhood of the detected sample under the condition that the number of neighbors in the detected sample is all ?.Experimental results show that the detection efficiency of the proposed algorithm is higher than that of the traditional algorithm.
Keywords/Search Tags:KNN algorithm, Fast KNN algorithm, Outlier data, Hollow hypersphere, Time complexity
PDF Full Text Request
Related items