In recent years,with the arrival of the era of big data,the amount of data generated in all areas of the world has exploded.How to mine and utilize the valuable information hidden in the massive data is the focus of research.Cluster analysis tools as an important tool for processing large data has been the focus of research.The main research content of this article is the Density Peaks Clustering(DPC)algorithm.The core of the algorithm design is that the cluster centers of different clusters have a large local density and are far away from other cluster centers.By using decision graphs to select initial centers and achieve rapid allocation of subsequent non-cluster center labels.DPC,as an efficient clustering algorithm that requires no prior knowledge and can recognize arbitrarily shaped clusters,has been favored by researchers since it was proposed.However,DPC algorithm also has certain limitations,such as: when executing DPC algorithm,the cut-off distance parameter needs to be set in advance and the clustering effect is sensitive to the value of the cut-off distance parameter;The selection of the initial center in the clustering process is influenced by subjectivity,unable to process datasets with large differences in data density,and the time complexity of the algorithm is high.Therefore,it is necessary to optimize and improve the DPC algorithm and expand its scope of application.This article classifies and summarizes different DPC optimization algorithms,and proposes optimization algorithms for the defects of DPC.The main research contents are as follows:(1)DPC algorithm has been widely used in many fields in recent years,but there are limitations in some aspects of DPC algorithm.Aiming at the shortcomings of DPC algorithm,many related optimization algorithms have emerged,and the improved density peaks clustering algorithm is summarized in the form of a summary.According to the basic principle of DPC algorithm,the limitations of DPC execution process are highlighted and relevant representative DPC optimization algorithms are summarized.The DPC optimization algorithms proposed in recent years are divided into four major categories according to the optimization direction,and the advantages,disadvantages,and core strategies of each category of improved DPC algorithms are summarized and compared to facilitate the discovery of limitations and solutions in the implementation process of DPC algorithms.(2)A density peaks clustering algorithm SNNDPC-ID based on shared nearest neighbor and proximity is proposed.to solve the problem of sparse clusters are missing when the density difference between DPC clusters is large and "Domino" problem for non-central point label assignment.Firstly,a shared nearest neighbor is used to define a new local density based on the proximity of different data objects to their K-nearest neighbors;Secondly,using Laplacian Eigenmaps to reduce dimensions,the dataset is projected into a lower dimensional space;Finally,in the low dimensional space,an appropriate initial clustering center is selected based on the newly defined local density and relative distance,and the remaining data objects are clustered according to the enhanced two-step allocation strategy.Comparative experimental results show that the algorithm performs well on synthetic data sets and UCI data sets,especially on uneven density distribution and high-dimensional data sets.(3)To solve the problem of DPC algorithm being sensitive to cut-off distance parameters and difficult to handle data sets with complex shapes and structures,a density peaks clustering algorithm based on natural neighbors and cluster backbone is proposed.Firstly,natural nearest neighbors are introduced and the calculation method of local density of data points is redefined to cope with large differences in sparsity between clusters without specifying any parameters.In addition,searching for backbone points reveals the structure and shape of potential clusters,making full use of the distribution information of the dataset,and forming backbone micro clusters in a local range;Finally,using the idea of agglomerative hierarchical clustering algorithm,a fusion scoring mechanism between backbone micro clusters is established to merge micro clusters to obtain more accurate clustering results.The experimental results on synthetic dataset and UCI dataset show that the clustering accuracy of DPC-NN-CB is significantly better than other comparative clustering algorithms... |