Font Size: a A A

Study On Clustering Algorithms Based On Density And Direction

Posted on:2020-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:N XiaoFull Text:PDF
GTID:2428330620451105Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a recently proposed density-based clustering algorithm,the density peak clustering algorithm(DPC)has received wide attention from researchers.It is based on two simple assumptions and can achieve effective clustering in many synthetic datasets.However,DPC also has some problems,mainly in two aspects.First,it requires the user to manually se-lect the cluster centers according to the decision graph.In many real data sets,manually selecting the cluster centers is not an easy task.Second,DPC is difficult to achieve good clustering results in data sets with uneven density distribution.This problem is ubiquitous in density-based clustering algorithms.For the first problem,this paper provides an opti-mization scheme for DPC to automatically select the cluster centers.Aiming at the second problem,in order to mitigate the influence of uneven density distribution on the clustering performance of algorithms,this paper takes direction as the main basic physical metric,and proposes a direction-based clustering algorithm DBCA based on the orientation relationship between data points.The detailed work of this paper is as follows:1)The clustering algorithms which can find clusters of arbitrary shapes are studied in depth,and the existing optimization schemes of DPC are analyzed.In addition,the clus-tering accuracy,adjusted rand index,normalized mutual information and adjusted mutual information are introduced,and some clustering algorithms used in the experiment are de-scribed in detail.2)Aiming at the difficulty of manually selecting cluster centers in DPC,a two-stage clustering algorithm KDPC is proposed based on the clustering ideas of K-means++and DPC.By specifying the number of clusters in advance,KDPC can automatically obtain a specified number of cluster centers.Experiments show that when the parameter d_c makes the ratio of the average number of neighbors to the total number of data points in the data set change in a fixed set,KDPC can always get a good clustering result.In addition,the experiments also prove that KDPC and DPC can achieve similar clustering effect in both synthetic data sets and real data sets,and KDPC performs better than DPC in data sets with significant difference in density of clusters.3)A clustering algorithm DBCA which is not susceptible to uneven density distribu-tion is proposed.DBCA uses direction as the core physical metric and uses orientation information between data points to help with clustering.DBCA doesn't need to take the number of clusters as input.Although it requires two parameters,the two parameters are independent of each other and each has a fixed set of empirical values.Experiments show that DBCA can identify clusters of different shapes,sizes and densities,and DBCA can effectively deal with noise when the first parameter represents the neighborhood radius.In addition,compared with DBSCAN and DPC,DBCA can always achieve better clustering effect in data sets with uneven density distribution.Compared with some state of the art clustering algorithms,DBCA can always present better clustering effect.
Keywords/Search Tags:Clustering, Cluster Centers, Density, Direction, Uneven Density Distribution
PDF Full Text Request
Related items