Font Size: a A A

Research On Quick Clustering Algorithm Based On Density Subgraph

Posted on:2022-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:X C ZhengFull Text:PDF
GTID:2518306539469334Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and the emergence of massive information on the Internet,how to use data mining techniques to analyze the hidden information has become an important research content.Clustering analysis as an important research tool in the field of data mining,and among many clustering algorithms,density-based clustering algorithm is widely used because it can discover arbitrarily shaped class clusters and is insensitive to the noise of the data set.Density-based clustering algorithm can be understood as an algorithm to find the highest density point,which is usually regarded as the center-of-mass point of class clusters,but considering only the highest density point is prone to over-segmentation problem.For this reason,many corresponding improvement algorithms have been proposed,but all of them have more or less some problems.To address the current research status,this paper proposes a fast clustering algorithm based on density subgraph(Quick DSC).The algorithm evaluates the sample points from both density and distance dimensions while preserving the density information of the sample points,which improves the efficiency of the density clustering algorithm while ensuring the clustering effect.The main research content and work of this paper can be briefly summarized into three main aspects as follows.(1)Using k-nearest neighbor density estimation to calculate the density of sample points,which preserves the local density information of sample points.Compared with the method of using global parameters to calculate the density of sample points,k-nearest neighbor density estimation can reduce the influence of the choice of parameters on the clustering results,while being more applicable to data sets with uneven data distribution.(2)The over-segmentation problem in the clustering process is avoided by density subgraphs.Based on the mutual k-nearest neighbor graph,the sample points with density values larger than the threshold are connected to build the density subgraph,and the density relationship between the sample points is recorded,and the representative points of each density subgraph are identified.(3)Using the "density-distance" decision diagram,the desired K clusters are quickly returned.We estimate the importance of the density subgraphs in two dimensions,density and distance,and select the top K important samples from the decision diagram as the initialized centroids of the class clusters,and divide the remaining objects into class clusters to obtain the clustering results.Since the representative points of the density subgraph are estimated,it greatly reduces the scale of data operation and computational cost,and also solves the problem that the density-based clustering algorithm cannot return the artificially specified number of class clusters.By conducting experiments on a large number of datasets,both artificial and real,and evaluating them on several clustering metrics,the experimental results show that the algorithm proposed in this paper has excellent performance in terms of clustering effect and has significantly improved in terms of clustering efficiency.
Keywords/Search Tags:Density estimation, k-NN graph, Quick clustering, Density peak clustering, Clustering algorithm
PDF Full Text Request
Related items