| With the increasing size of information network technology,in the face of complex,diversified and quantified data,how to analyze and handle them productively has become the priority of study in today’s era,and accelerated the pace of research on clustering algorithms in the field of machine learning,so that the density peak clustering algorithm(Clustering by Fast Search and Find of Density Peaks,DPC)came into being.This paper focuses on the DPC algorithm,as soon as it is presented that it gained the concentration and study of scholars.The reason is that its benefits for instance higher efficiency in implementation and novel design concept,ability to handle nonlinearly divisible datasets,fast identification of clustering centers through decision diagrams,and insensitivity to outliers.However,the DPC algorithm also has some defects such as higher computational complexity,subjective selection of clustering centers,and data assignment prone to collateral error.Therefore,this paper designs two improved DPC algorithms based on the above mentioned deficiencies,as follows:To address the problems of poor performance of the DPC algorithm in dealing clusters with multiple density peaks,empirical choice of clustering centers based on decision diagrams,and unrobust data assignment process.A new density peak clustering algorithm based on cluster fusion strategy is proposed.Firstly,the algorithm screens out the candidate clustering centers by setting two new thresholds to avoid the effect of noise points and outliers.Secondly,the structural characteristics and spatial distribution of the dataset are considered,new definitions of boundary points,inter-cluster intersection density and inter-cluster boundary density are given.To correctly classify clustering problems with multiple density peaks in the same cluster,a new clustering fusion strategy is designed,which not only correctly selects the cluster centers but also corrects the collateral errors in the data point assignment process.Finally,experimental tests are conducted,and the results indicate that the new algorithm utmostly enhances the clustering accuracy and robustness.To address the problems that the DPC algorithm has poor clustering performance when dealing with unevenly distributed datasets,the calculation of distance only considered in the algorithm ignores the correlation between samples,and the acquisition of clustering centers by intuition based on decision diagrams.A density peak clustering algorithm based on shared neighborhood is presented.Firstly,the information about the neighbors of the data points and the degree of relationship between the data are considered,and the local density is redefined according to the shared neighborhood.Secondly,a new decision threshold is designed as the threshold value to distinguish the clustering centers and non-clustering centers,and the clustering centers are automatically obtained to avoid the influence of human intervention.Finally,comparison experiments are set up.The results indicate that the new algorithm enhances the accuracy and stability while maintaining the original complexity. |