Font Size: a A A

A Fast Clustering Algorithm Based On Local Density And Framework Distance Between Clusters

Posted on:2023-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:X G ChenFull Text:PDF
GTID:2558307079988339Subject:Computer technology
Abstract/Summary:PDF Full Text Request
This paper proposes a fast clustering algorithm based on local density and framework cluster distance measurement,called DFC algorithm.Different from the traditional classification standards of clustering algorithms(based on division,hierarchy,density,etc.),from the perspective of graph clustering,this paper divides the clustering algorithms into two categories:based on Prim MST algorithm or based on Kruskal MST algorithm.DFC is an algorithm that uses the ideas of Prim MST and Kruskal MST at the same time.The DFC algorithm is proposed based on a simple basic idea: any cluster is composed of several convex subclusters.Based on this idea,different from the traditional clustering algorithm,the DFC algorithm has two basic stages: pre-clustering and subclusters merging.The pre-clustering process mainly includes the following steps:(1)Calculate the local density of each sample;(2)Calculate the closest distance between each sample and the sample with higher density;(3)Based on the above two values,select the sample with higher local density and farther away from other high-density samples as the center point of the subcluster;(4)Divide the remaining non-central samples into corresponding subclusters according to certain rules to obtain pre-clustering results.Based on the above method,the pre-clustering stage of DFC will get more subclusters than the target clusters number.In the subsequent subcluster merging stage,DFC simply needs to iteratively merge the two closest subclusters until the number of clusters meets the target value.It is worth noting that DFC uses a framework-based measure of calculate cluster distance.For each subcluster,several samples are selected as the frame points of the cluster through a specific algorithm,and these frame points can usually well represent the shape and structure of the cluster.Compared with other cluster distance measurements,this method reduces the computational complexity,and at the same time,the impact of collecting outliers is small.In the data graph generated by DFC in the first stage,each sample(vertex)only builds an edge between itself and the closest samples with higher density.Compared with the complete graph based on the dataset,this greatly reduces the number of edges.In the data graph generated in the second stage of DFC,each vertex represents a subcluster,thus compared with the complete graph based on the dataset,which again greatly reduces the number of vertices.Analysis shows that ideally,the time complexity of the DFC algorithm is only (9)log 9)),and the space complexity is only (9)).The comparison experiments with other 7 clustering algorithms on 8 artificial datasets and12 real-world datasets show that the overall clustering accuracy of the DFC algorithm is the best,and the average clustering accuracy is improved by about 4% compared with the second place.Also,the clustering results of DFC are in the top 3 on all experimental datasets.Meanwhile,the actual running time of DFC is at the same level as other (9)log 9))complexity algorithms.
Keywords/Search Tags:Clustering, Pre-clustering, Subcluster merging, Local density, Decision graph, Framework distance, Graph clustering
PDF Full Text Request
Related items