Font Size: a A A

Research And Application Of Density Peak Clustering Algorithm Based On Natural Neighbors And Representative Points

Posted on:2021-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:C L YaoFull Text:PDF
GTID:2518306107483784Subject:Engineering
Abstract/Summary:PDF Full Text Request
Clustering is an important branch in data mining.The goal of clustering is to divide the data set into several sub-clusters without prior knowledge,and strive to maximize the similarity of the samples in each cluster and the lowest similarity of the samples between different clusters.As an important method in data mining area,clustering has been widely used in many fields,like network security,information extraction,and image segmentation and so on.According to different technical routes,clustering algorithms can be roughly divided into five categories: partition-based,density-based,layer-based,grid-based,and model-based.In 2014,Rodriguez et al.proposed clustering by fast search and find of density peaks(DPC).DPC introduces the idea of decision graph to select cluster-like centers,and at the same time,the one-step allocation strategy for non-class-like cluster samples makes the algorithm efficient and fast.Although the DPC algorithm has obvious advantages,it still has disadvantages:(1)The algorithm's one-step allocation strategy has a collateral error effect,which makes it impossible to effectively process data sets containing complex manifold clusters;(2)The algorithm is sensitive to the initial parameter cutoff distance dc.Aiming at the above two shortcomings of DPC,this thesis proposes a new density peak clustering algorithm DPCNNR based on natural neighbors and representative points.Firstly,we select local representative points that are likely to be referenced by other data objects during allocation and calculate the geodesic distance;Then use the the idea of DPC algorithm to cluster local representative points to ensure that the selected local representative points is correctly clustered;Finally,we divide non-representative points to a cluster according to the relationship between non-representative points and representative points.And the overall clustering effect improves because of those.The main content and work of this thesis are as follows:(1)A new density peak clustering algorithm DPCNNR based on natural neighbor and representative points is proposed.Through analysis,it is found that dense data points in local areas often become reference objects when assigning.As long as these high-density data points can be correctly clustered,the overall clustering effect can be improved.So,the technical route of our proposed clustering algorithm is: firstly,we select effective local representative points and calculate geodesic distance;then we use DPC algorithm to cluster representative points;finally,we use the relationship between non-representative points and representative points to allocate the remaining samples.In order to ensure the validity of the representative points,the idea of natural neighbors without parameters is used to select the representative points,so that the DPCNNR algorithm does not need to set parameters manually and can effectively process data sets with complex manifold distribution characteristics.Experimental results show that in processing complex datasets,the DPCNNR algorithm has more prominent advantages.(2)A label distribution learning algorithm LDL-DPCNNR based on DPCNNR algorithm is proposed.The thesis discusses and studies the specific application of DPCNNR algorithm in label distribution learning.Based on the premise that samples with similar characteristics have similar label distributions,clustering ideas are applied to label distribution learning.According to the characteristics of data sets,the data into several clusters by the clustering algorithm.Then a parametric model is established by the distance of predicting samples and the clusters to calculate the label distribution of the predicted samples.Experiments show that compared with the existing label distribution learning algorithms,the prediction results of the label distribution learning algorithm combined with the clustering algorithm are better than other algorithms.
Keywords/Search Tags:Clustering, Density Peaks Clustering Algorithm, Natural Neighbor, Representative Point, Label Distribution Learning
PDF Full Text Request
Related items