Font Size: a A A

Optimization Of Density Peak Algorithm Based On Shortest Path

Posted on:2023-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:M J LiFull Text:PDF
GTID:2558306911985579Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Clustering by Fast Search and Find of Density Peaks Algorithm(DPC)has been widely used since it was proposed because of its simple idea,few required parameters,and high Clustering efficiency.However,with the in-depth study of the DPC algorithm,some problems and defects are gradually exposed:it is difficult to determine the truncation threshold(9((8),difficult to accurately select the cluster center,easy to cause associated errors in sample division,poor processing effect for data sets with complex manifold structure and large density difference between class clusters,etc.To improve the clustering ability of the DPC algorithm,this thesis proposes an optimization of density peak algorithm based on the shortest path,which optimizes the DPC algorithm in two aspects:the selection of cluster center and the division of sample points outside the cluster center.The main work is as follows:(1)Optimization of the DPC algorithm based on shortest path-selection of clustering centers.The DPC algorithm needs to construct a decision graph in the coordinate graph to select clus-ter centers.When the number of sample points conforming to the characteristics of cluster centers in the decision graph is too much or too little compared with the number of real clus-ters,it is difficult to select the cluster centers with the correct number that can represent the information of real clusters.To solve the problem of the difficult selection of clustering cen-ters in the DPC algorithm,the descending graph of the product of local density and minimum distance was constructed,and the potential clustering centers were determined according to the variation rule of parameter values in the descending graph.The concept of distance reachable is proposed,and based on the shortest path algorithm,if the distance between two potential cluster centers is reachable,the potential cluster centers with higher density should be retained,and the ones with lower density should be discarded,and they should be com-bined.To prevent the clustering centers of different classes from being merged incorrectly,the concept of low-density points is proposed.Low-density points cannot participate in the judicial process of distance reachability,which effectively improves the accuracy of the DPC algorithm to obtain clustering centers.(2)Optimization of the DPC algorithm based on shortest path-sample division.The dis-tribution of sample points outside the cluster center by the DPC algorithm is based on the Euclidean distance to the nearest high-density point.When processing some manifold data sets or density difference data sets,the sample division is easy to cause associated errors.To solve the above problems,this thesis proposed a new sample division method based on the shortest path algorithm.When dividing the labels of the sample points outside the clus-ter,firstly,the sample points in the(9((8)neighborhood of each cluster center are labeled with the same class label as the cluster center.Secondly,the shortest path from the remaining unlabeled sample points to each cluster center is calculated,and then the sample points are allocated according to the weight of the shortest path,which improves the clustering abil-ity of the DPC algorithm for complex manifold and density difference data sets.Moreover,the new truncation threshold is used as the upper limit of the edge distance in the shortest path algorithm.The new threshold is convenient to calculate and smaller than the original threshold under the same conditions,which effectively reduces the number of sample points involved in the iteration and the running time of the shortest path algorithm.Experiments on several artificial data sets and UCI real data sets verify the validity and accuracy of the pro-posed algorithm by comparing it with the DPC algorithm,DBSCAN algorithm,and KDPC algorithm.
Keywords/Search Tags:Density peak, clustering center, allocation strategy, shortest path, truncation threshold
PDF Full Text Request
Related items