Font Size: a A A

Density Peak Clustering Study Based On Bayesian And Statistical Strategies

Posted on:2023-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:T WuFull Text:PDF
GTID:2568307064970509Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Clustering is an important research task in the field of data mining,which can divide a collection into multiple classes composed of similar objects based on similarity measures in an unlabeled data collection.The task obtained labeled disaggregated data to support decision makers in choosing behaviors.Aiming at the problem that the existing density peak clustering(DPC)is difficult for DPC to manually determine the cut-off distance value and the misjudgment transfer when processing complex structural data with uneven density,this paper proposes an improved DPC algorithm based on Bayesian and based on KNN and statistical learning strategy(denoted as BODPC and KE-DPC,respectively),the main research contents are as follows:(1)In view of the DPC facing complex structure datasets,the truncation distance needs to be set empirically,which makes the clustering results extremely subjective.In this paper,a density peak clustering algorithm based on Bayesian optimization is proposed,which introduces Bayesian optimization theory into density peak clustering and adapts the scanning parameter space through Bayesian algorithm.Firstly,the set of cut-off distance primary selection points is generated.Then,the mean and variance of the function value of the objective function at the primary selection point set are calculated through the Gaussian regression process,and then the mean and variance are used to calculate the acquisition function to determine the next sampling point.Iteratively find the optimal value in the solution space,and finally realize the clustering analysis process of the algorithm.The results of several complex structures on artificial datasets and real datasets verify the effectiveness of BO-DPC algorithm,and the values of various clustering evaluation indexes are improved,and the clustering effect is good.(2)For the DPC algorithm,on complex structural datasets with uneven density,once the sample points are misdivided,it will lead to the problem of chain reaction.Firstly,the clustering center point is determined by the measurement function and decision plot.The remaining points are then divided into high-density cluster points,boundary points and noise points;K neighbors are introduced to assign corresponding labels to highdensity cluster points;Then the statistical learning strategy is used to calculate the probability that the boundary point belongs to a high-density cluster to assign the corresponding label to the boundary point,and complete the clustering process of the whole algorithm.The results of several complex structures on artificial datasets and real datasets verify the effectiveness of the KE-DPC algorithm,and the values of the cluster evaluation indexes are improved compared with other algorithms.Experimental results show that the BO-DPC algorithm and KE-DPC algorithm proposed in this paper can more accurately identify the number of clusters and assign boundary points on complex structural datasets.Clustering tasks can be widely used in image segmentation and machine learning,and have potential application value for those in the field of image segmentation and machine learning.
Keywords/Search Tags:Cluster analysis, Density peak clustering, Truncated distance, Bayes, Statistical learning strategies
PDF Full Text Request
Related items