Font Size: a A A

Research And Application Of Fast Density Peak Clustering Algorithm

Posted on:2021-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:L L ShenFull Text:PDF
GTID:2428330611962517Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Due to the advancement of storage technology and the continuous generation of various data in daily life work,the era of big data has arrived.Through the analysis and mining of massive data,people can get the valuable information they need.The speed of processing massive data is still difficult to meet people's needs.Therefore,efficiently digging out the valuable and valuable information that people need from large-scale data has become a difficult problem in data processing.Machine learning plays an important role in solving this kind of problem,and clustering algorithm is an important branch of machine learning algorithm.The density peak clustering algorithm(DPeak)is one of the current popular clustering algorithms.The algorithm has the advantages of simple idea,unique parameters and clustering into arbitrary shape clusters.Because of these advantages,DPeak attracted the attention of a large number of researchers as soon as it was proposed.Although DPeak has many advantages,but its time complexity is O(n~2),it is not suitable for processing large scale data.Because the algorithm uses the brute force method to calculate?and?,its time complexity is O(n~2).Therefore,there are a lot of redundant calculations in the calculation.In this paper,an in-depth analysis of the DPeak algorithm is carried out,and on the basis of summing up the predecessors,the essence is discarded and the dross is discarded.A fast density peak clustering algorithm is proposed.This algorithm significantly improves the speed of DPeak algorithm for processing large-scale data.This article mainly includes the following aspects:(1)This paper analyzes the nature of the DPeak algorithm and discusses the problem of its category attribution.Comparing DPeak with the five classic clustering algorithms of k-means,DBCAN,spectral clustering algorithm,nearest neighbor propagation clustering,and mean shift,it is found that DPeak algorithm is very similar to mean shift algorithm.This paper presents a conjecture that DPeak may be a special mean shift algorithm.However,whether DPeak can be explained within the framework of MeanShift remains to be further studied.(2)The complexity of DPeak algorithm is O(n~2),which is not suitable for large scale data.Therefore,this paper proposes FastDPeak.This algorithm uses the cover tree to improve the calculation speed of density?.In addition,the calculation of?value is reduced from global search to local search,so that the calculation time complexity of?is reduced to O(n).In summary,the time complexity of FastDPeak is O(nlog(n)).Experimental results on multiple data sets show that FastDPeak is an effective algorithm with better performance than other DPeak variants.It is of great significance for the improvement of data processing and speed.
Keywords/Search Tags:data mining, clustering algorithm, density peak clustering, big data
PDF Full Text Request
Related items