Research And Application Of Fast Density Peak Clustering Algorithm

Posted on:2021-02-22

Degree:Master

Type:Thesis

Country:China

Candidate:L L Shen

Full Text:PDF

GTID:2428330611962517

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Due to the advancement of storage technology and the continuous generation of various data in daily life work,the era of big data has arrived.Through the analysis and mining of massive data,people can get the valuable information they need.The speed of processing massive data is still difficult to meet people's needs.Therefore,efficiently digging out the valuable and valuable information that people need from large-scale data has become a difficult problem in data processing.Machine learning plays an important role in solving this kind of problem,and clustering algorithm is an important branch of machine learning algorithm.The density peak clustering algorithm(DPeak)is one of the current popular clustering algorithms.The algorithm has the advantages of simple idea,unique parameters and clustering into arbitrary shape clusters.Because of these advantages,DPeak attracted the attention of a large number of researchers as soon as it was proposed.Although DPeak has many advantages,but its time complexity is O(n~2),it is not suitable for processing large scale data.Because the algorithm uses the brute force method to calculate?and?,its time complexity is O(n~2).Therefore,there are a lot of redundant calculations in the calculation.In this paper,an in-depth analysis of the DPeak algorithm is carried out,and on the basis of summing up the predecessors,the essence is discarded and the dross is discarded.A fast density peak clustering algorithm is proposed.This algorithm significantly improves the speed of DPeak algorithm for processing large-scale data.This article mainly includes the following aspects:(1)This paper analyzes the nature of the DPeak algorithm and discusses the problem of its category attribution.Comparing DPeak with the five classic clustering algorithms of k-means,DBCAN,spectral clustering algorithm,nearest neighbor propagation clustering,and mean shift,it is found that DPeak algorithm is very similar to mean shift algorithm.This paper presents a conjecture that DPeak may be a special mean shift algorithm.However,whether DPeak can be explained within the framework of MeanShift remains to be further studied.(2)The complexity of DPeak algorithm is O(n~2),which is not suitable for large scale data.Therefore,this paper proposes FastDPeak.This algorithm uses the cover tree to improve the calculation speed of density?.In addition,the calculation of?value is reduced from global search to local search,so that the calculation time complexity of?is reduced to O(n).In summary,the time complexity of FastDPeak is O(nlog(n)).Experimental results on multiple data sets show that FastDPeak is an effective algorithm with better performance than other DPeak variants.It is of great significance for the improvement of data processing and speed.

Keywords/Search Tags:

data mining, clustering algorithm, density peak clustering, big data

PDF Full Text Request

Related items

1	Research And Application Of Density Peak Clustering Algorithm Based On Spark Framework
2	Research And Application Of Fast Density Peak Clustering Algorithm
3	Research On Hierarchical Clustering Algorithm Based On Density Peaks
4	Research Of Clustering Algorithm Based On Density Peak
5	Research And Implementation Of Density Peaks Clustering Algorithm
6	Research And Application Of Clustering By Fast Search And Find Of Density Peaks
7	Research On Density Peak Clustering And Its Application In Community Detection
8	Multi-Granular Big Data Analytics Based On Density Peak
9	Research On High Dimensional Data Clustering Algorithm Based On Subspace And Density Peak
10	Research On Density Peak-based Clustering Algorithm And Its Parallel Implementation