Font Size: a A A

Research On Improved Density Peak Clustering Methods Based On K-nearest Neighbors

Posted on:2022-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:X Y QinFull Text:PDF
GTID:2518306491952509Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
Clustering by fast search and find of density peaks(DPC for short)is a kind of density-based clustering algorithm,which is widely used in the fields of data mining,pattern recognition,bioinformatics,etc.This DPC algorithm can automatically find outliers and identify clusters without considering the shape of the clusters and the dimensions of space in which clusters are embedded.In recent years,the DPC algorithm has attracted the increasing research of the numerous scholars.However,the current DPC algorithms still have many problems that need to be further studied and improved.To address these issues,this paper studies the calculation formula of density,the identification strategy of the cluster centers and the assignment steps of points to improve and enhance the clustering performance of the DPC algorithm.Through theoretical analysis and compared experiments,the effectiveness of the proposed algorithms is tested and verified.The main research work of this Master's thesis is as follows:(1)It is difficult for DPC to select a cut-off distance in the calculation of a local density of points,and DPC easily ignores the cluster centers with lower density in data sets with variable densities.In addition,for clusters with complex shapes,DPC selects only one cluster center for a cluster,meaning that the structure of the whole cluster is not fully reflected.To overcome these drawbacks,this paper presents a novel DPC model based on K-nearest neighbors(KNN)and self-recommendation,called DPC-MC for short.First,the KNN-based neighbourhood of point is defined and the mutual neighbour degree of point is presented in this neighbourhood,and then a new local density based on the mutual neighbour degree is proposed.This local density does not need to set the cut-off distance manually through analysis.Second,to address the artificial setting of cluster centers,a self-recommendation strategy for local centers is provided.Finally,after the selection of multiple local centers,the binding degree of microclusters is developed to quantify the combination degree between a microcluster and its neighbour clusters.After that,homogeneous clusters are found according to the binding degree of microclusters during the process of deleting boundary points layer by layer.The homologous clusters are merged,the points in the abnormal clusters are reallocated,the clustering process ends,and then the new DPC algorithm is designed.The nine synthetic data sets and twenty-six real-world data sets are selected to verify the effectiveness of our algorithm.The experimental results demonstrate that the presented algorithm outperforms other compared algorithms in terms of purity,F-measure,Fowlkes-Mallows index,accuracy,Rand index,adjusted mutual information,normalized mutual information and adjusted Rand index.(2)In real-world practical applications,it is difficult for DPC to select the correct cluster centers for data sets with large differences of density between clusters or multi-density peaks in clusters.In addition,the allocation method of point in the DPC model has a low accuracy.To solve these issues,a novel density peak clustering algorithm based on K-nearest neighbors and optimized allocation strategy is proposed.First,the candidate cluster centers using KNN,density of points and boundary points are determined.The ?distance is proposed to judge whether there exists a sparse region between two candidate cluster centers.According to the ? distance,the density factor is proposed to solve the problem of multi-density peaks,and then the distance factor is constructed to reflect the relationship closeness between two candidate cluster centers.Based on the density factor and distance factor above,the possibility that a candidate cluster center is the cluster center is calculated.Second,to improve the robustness of DPC,according to the shared nearest neighbors,the high density nearest neighbors,density difference and distance between the KNN of two points,their similarity measures are constructed,respectively.Based on these four similarity measures,the concepts of neighborhood,similarity set,similarity domain,and approximate domain are proposed to assist in the allocation of points.The initial clustering results are determined according to similarity domains and boundary points,and then the intermediate clustering results are obtained based on the cluster centers and initial clustering results.Finally,according to the intermediate clustering results and similarity set,the clusters are divided into multiple layers from the cluster centers to the cluster boundaries,and then different point allocation strategies are designed for different layers.For the points of a certain layer,to determine the specific allocation order of each point,an allocation ratio is proposed based on similarity domain.For a specific point,it is allocated to the dominant cluster in its similar domain.Through all the above steps,the final clustering results can be obtained.By comparing with the latest related DPC algorithms on 11 synthetic data sets and 27 real data sets,the experimental results demonstrate that our presented algorithm has good performance in metrics of purity,F-measure,accuracy,Rand index,adjusted Rand index,and normalized mutual information.(3)To further verify the the clustering performance of the two above improved DPC methods based on KNN,these two algorithms are applied into the clustering analysis of image data.First,the two improved DPC algorithms are simulated and tested on image data sets such as face images,physical images,and handwritten digits.Second,the clustering analysis is performed on these above image data sets based on different evaluation metrics,and the experimental results of our proposed DPC algorithms are compared with the state-of-the-art clustering algorithms.Finally,the results of clustering analysis on all image data sets are discussed in detail.It shows that the two proposed DPC algorithms can effectively improve the clustering accuracy of all the image data sets for image clustering.Namely,their clustering performance of image data is better than the other clustering algorithms,these improved DPC algorithms can efficiengly cluster a variety of types of image data sets,and they can be applied in some practical fields such as face recognition,image retrieval and so on.
Keywords/Search Tags:Density peaks clustering, K-nearest neighbors, Similarity measure, cluster center, allocation strategy
PDF Full Text Request
Related items