Font Size: a A A

Research On Density Peaks Clustering Algorithm Based On Nearest-Neighbor Optimization

Posted on:2024-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:C H ZhangFull Text:PDF
GTID:2568307082479874Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Clustering by fast search and find of density peaks(CFSFDP)has the advantages of a novel idea,easy implementation,and efficient clustering.It has been widely recognized in various fields since it was proposed in Science in 2014.The algorithm also has certain limitations.In this thesis,we optimize the corresponding limitations of the CFSFDP algorithm and propose a new clustering algorithm by combining the nearest neighbor idea,The effectiveness of the proposed method is verified on the public data sets.Finally,we combine the advantages of the nearest neighbor optimized CFSFDP algorithm for finding the initial clustering centers with the improved fuzzy c-means(FCM)algorithm,which has complementary algorithmic advantages and improves the clustering performance.The details of the study are as follows.(1)The CFSFDP algorithm also has certain limitations,such as non-unified sample density metrics defined by cutoff distance,the “Domino Effect” for the assignment of remaining samples triggered by unstable assignment strategy,and the phenomenon of picking wrong density peaks as cluster centers.We propose reverse-nearest-neighbor-based clustering by fast search and find of density peaks(RNN-CFSFDP)to avoid these shortcomings.We redesign and unify the sample density metric by introducing reverse nearest neighbor.The newly defined local density metric and the K-nearest neighbors of each sample are combined to make the assignment process more robust and alleviate the “Domino Effect”.Specifically,a cluster fusion algorithm is proposed,which further alleviates the “Domino Effect” and effectively avoids the phenomenon of picking wrong density peaks as cluster centers.Experimental results on publicly available data sets show that in most cases,the proposed algorithm is superior to or at least equivalent to the comparative methods in clustering performance.Primarily the proposed algorithm works better on manifold and uneven-density data sets.(2)We combine the natural neighbor search algorithm to improve a series of CFSFDP algorithm problems and propose the density peak clustering algorithm optimized by natural neighbor search(Na N-CFSFDP).First,we propose an outlier samples detection method based on the natural neighbor search algorithm.Then,for the problem that the cutoff distance is difficult to be taken accurately manually in the CFSFDP algorithm,the calculation of cutoff distance is improved in combination with the natural neighbor search algorithm,and the automatic taking of cutoff distance is realized.The metric of the sample density of the CFSFDP algorithm is redesigned and unified to make it pay more attention to the local information of each sample.Finally,to address the problem that the density peak points in the dataset may be concentrated in dense clusters due to the large density difference between clusters,which leads to cluster loss,the concepts of shared natural neighbors for samples and shared natural neighbors for clusters are proposed to construct a new cluster fusion algorithm.Experimental results on synthetic and real datasets show that the algorithm outperforms or is at least comparable to the comparative method in terms of clustering performance in most cases and has fewer parameters compared to the CFSFDP algorithm and its improvements.(3)The clustering results of the traditional FCM algorithm are easily affected by the random selection of initial cluster centers.The influence of different features of samples and the importance of samples on the clustering results are ignored in the clustering process.Aiming at this series of problems,we propose a fuzzy clustering algorithm based on information entropy weighting(ANNDP-WFCM)combined with adaptive nearest neighbors and density peaks.Firstly,we realize the automatic search of the initial clustering centers by combining the adaptive nearest neighbors density peaks algorithm(ANNDP).The nearest neighbors of each sample can be adaptively found for data sets with different scales and structures.We define the local density of the sample according to the information of the nearest neighbors,and the density peak points in the data set are searched and found as the initial clustering centers.Then,the importance of different features in the clustering process is distinguished by information entropy weighting.At the same time,the reciprocal of the distance between samples is used to weight the sample itself,and the fuzzy clustering centers in the objective function are redefined.Finally,for the objective function,we use the Lagrange multiplier method to alternately optimize the final membership matrix to get the clustering results.Through comparative experiments on different public datasets,it is verified that the ANNDP-WFCM algorithm has fewer iterations and higher clustering accuracy.
Keywords/Search Tags:Clustering, Density peaks, Reverse nearest neighbor, Natural neighbor, Fuzzy c-means
PDF Full Text Request
Related items