Font Size: a A A

Research On Clustering Algorithm Based On Density Analysis

Posted on:2022-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2518306524483574Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Clustering algorithm plays an important role in data mining because of its ability to classify large scale unlabeled data.So far,it has developed into many branches,such as partition clustering,density clustering,spectral clustering,etc.This thesis mainly studies the following two problems of density clustering.The first problem is that the traditional density clustering algorithm inherited from DBSCAN has a common feature,that is,a sin-gle global density threshold is set to identify the sparse and dense regions.This strategy fundamentally determines that they are difficult to deal with variable density data clus-tering.Some recent clustering algorithms can deal with the problem of variable density data clustering to some extent,but they do not change the principle of setting global sin-gle threshold.In addition,the density is relative,and these methods are still weak when the density difference between clusters is very large and the boundary between clusters is not obvious.The second problem is that most density-based clustering algorithms use Euclidean distance to construct the similarity matrix.The proposed distance criterion can not construct a graph structure that meets the requirements of the algorithm on some data sets.To solve these two problems,two algorithms are proposed in this thesis.To solve the first problem,we propose a core structure expansion clustering algorithm based on density.The main principle of the algorithm is to use the density clustering al-gorithm only in the relatively high density area and the partitioning clustering algorithm in the low density area to avoid the problem of using a single threshold globally.In the relatively high density area,we adopted a more rigorous local gap density clustering al-gorithm to construct some core density structures,namely initial clusters.Then,using the idea of partitioned clustering,these core density structures are regarded as the representa-tive points of partitioned clustering.Finally,these core density structures are expanded in all directions according to a certain step size to complete the remaining point clustering.For some outliers that cannot be clustered by expansion,our algorithm identifies them as noise.To solve the second problem,we propose a variable density clustering algorithm based on l2-graph.This is a method of reconstruction coefficient based on the projection space of norm.It can better construct a similarity graph or construct a similarity matrix that can better reflect the relationship between data points.Its main basis is that the coefficient of data points in the projection space is smaller than the coefficient of data points among the projection spaces.That is to say,the data within the cluster has greater similarity than the data among the clusters.So we use l2-graph to replace the k-NN graph in LGD algorithm to improve the variable density clustering algorithm.In order to verify the two algorithms proposed in this thesis,we create matching two-dimensional data sets for the corresponding problems respectively to conduct demonstra-tion experiments.Without loss of generality,we also carry out precision comparison tests on four general data sets.The experimental results show that our algorithms can solve the above problems well and have higher accuracy compared with the related algorithms.
Keywords/Search Tags:Variable density clustering, Divided clustering, l2-graph, Noise reduction, Reconstruction coefficient
PDF Full Text Request
Related items