Font Size: a A A

Research On Clustering Of Density Peak Algorithm Based On Natural Neighbors

Posted on:2022-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y J WangFull Text:PDF
GTID:2518306353984059Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The density peak algorithm is a popular clustering algorithm in recent years.Its algorithm is simple and efficient,requires few parameters,and can identify non-convex data sets.Therefore,it has been widely concerned by experts and scholars and has been used in many practical scenarios.However,because the clustering results are more sensitive to parameters,it is necessary to manually select the appropriate number of clusters,and the data set with uneven distribution cannot be effectively dealt with.This paper studies the corresponding solution strategy,by introducing the idea of natural neighbor,realizing the real self-adaptation and improving the performance of the algorithm.The specific research content is:(1)In order to solve the problem of the influence of parameters on the clustering result and the difficulty of determining the number of clusters,a density peak algorithm combined with the elbow method to determine the number of clusters is proposed.First,a new kernel based on the idea of natural neighbors is proposed to improve the calculation of the local density of data points.Then,combined with the elbow method,search for the candidate center selected according to the decision graph to find the inflection point,and determine the number of true clusters in the data set.Finally,combined with the idea of "reduction and cure",the search process of natural neighbors is improved,and the efficiency of the algorithm is improved to process large-scale data sets.The introduction of the idea of natural neighbors enables the algorithm to achieve real self-adaptation instead of clustering based on empirical settings.For the combination of the elbow method and the density peak algorithm,the elbow method can search based on the candidate cluster centers selected in the decision diagram to find the inflection point,and the density peak can determine the number of clusters based on the obtained inflection point.It is verified by experiments that the proposed method can not only achieve true self-adaptation without being restricted by parameter settings,but also improves the performance of the algorithm.The experimental results on multiple artificial data sets and UCI real data sets verify that the algorithm in this paper can obtain the real number of clusters in the data set,and solves the problem that it is difficult to determine the number of clusters.(2)In order to solve the problem of not being able to effectively deal with data sets containing uneven density distribution,a density peaking algorithm based on hierarchical thinking is proposed.For the classification mechanism of candidate centers,the higher-density data points are no longer divided according to the density,but by calculating the connectivity or dissimilarity between the sub-clusters,the sub-clusters with higher connectivity are selected in turn.merge.When calculating connectivity,it is necessary to first obtain the intersection area between sub-category clusters.This paper combines the idea of natural neighbors to obtain the intersection between two sub-category clusters,specifically when the natural neighbors of data points within a certain cluster When it contains data belonging to other clusters,use this point as the intersection area of this cluster with other clusters.By introducing the idea of natural neighbors,it is possible to adaptively select the size of the intersection between sub-clusters,and solve the phenomenon that the traditional threshold calculation of intersection will cause too much intersection or no intersection due to the threshold setting being too large or too small.And it can calculate the connectivity between sub-clusters more accurately,and achieve higher classification effect.The experimental results on multiple artificial data sets and UCI real data sets verify that the proposed method can effectively deal with data sets with uneven density distribution and has certain feasibility.
Keywords/Search Tags:Density peak clustering, Natural neighbors, Number of clusters, Classification mechanism
PDF Full Text Request
Related items