Density Peak Clustering Study Based On Bayesian And Statistical Strategies

Posted on:2023-02-16

Degree:Master

Type:Thesis

Country:China

Candidate:T Wu

Full Text:PDF

GTID:2568307064970509

Subject:Computer technology

Abstract/Summary:

Clustering is an important research task in the field of data mining,which can divide a collection into multiple classes composed of similar objects based on similarity measures in an unlabeled data collection.The task obtained labeled disaggregated data to support decision makers in choosing behaviors.Aiming at the problem that the existing density peak clustering(DPC)is difficult for DPC to manually determine the cut-off distance value and the misjudgment transfer when processing complex structural data with uneven density,this paper proposes an improved DPC algorithm based on Bayesian and based on KNN and statistical learning strategy(denoted as BODPC and KE-DPC,respectively),the main research contents are as follows:(1)In view of the DPC facing complex structure datasets,the truncation distance needs to be set empirically,which makes the clustering results extremely subjective.In this paper,a density peak clustering algorithm based on Bayesian optimization is proposed,which introduces Bayesian optimization theory into density peak clustering and adapts the scanning parameter space through Bayesian algorithm.Firstly,the set of cut-off distance primary selection points is generated.Then,the mean and variance of the function value of the objective function at the primary selection point set are calculated through the Gaussian regression process,and then the mean and variance are used to calculate the acquisition function to determine the next sampling point.Iteratively find the optimal value in the solution space,and finally realize the clustering analysis process of the algorithm.The results of several complex structures on artificial datasets and real datasets verify the effectiveness of BO-DPC algorithm,and the values of various clustering evaluation indexes are improved,and the clustering effect is good.(2)For the DPC algorithm,on complex structural datasets with uneven density,once the sample points are misdivided,it will lead to the problem of chain reaction.Firstly,the clustering center point is determined by the measurement function and decision plot.The remaining points are then divided into high-density cluster points,boundary points and noise points;K neighbors are introduced to assign corresponding labels to highdensity cluster points;Then the statistical learning strategy is used to calculate the probability that the boundary point belongs to a high-density cluster to assign the corresponding label to the boundary point,and complete the clustering process of the whole algorithm.The results of several complex structures on artificial datasets and real datasets verify the effectiveness of the KE-DPC algorithm,and the values of the cluster evaluation indexes are improved compared with other algorithms.Experimental results show that the BO-DPC algorithm and KE-DPC algorithm proposed in this paper can more accurately identify the number of clusters and assign boundary points on complex structural datasets.Clustering tasks can be widely used in image segmentation and machine learning,and have potential application value for those in the field of image segmentation and machine learning.

Keywords/Search Tags:

Cluster analysis, Density peak clustering, Truncated distance, Bayes, Statistical learning strategies

Related items

1	Research And Improvement Of Density Peak Clustering Algorithm
2	Manifold Density Peak Clustering Algorithm And Its Application Of Weibo Text Classification
3	Research On Two Improved Density Peaks Clustering Algorithms
4	Research On Several Improved Density Peak Clustering Algorithms And Their Applications
5	Research On Application And Optimization Of Density Peak Clustering
6	Optimization Research Based On Density Peak Clustering Algorithm
7	Research And Application Of Clustering Algorithm Based On Density Peak
8	Research On Improved Density Peak Clustering Algorithm
9	Research On Density Peak Clustering Algorithm Based On Adaptive Reachable Distance
10	Research On Path-based Of Clustering By Fast Search And Find Of Density Peak