Font Size: a A A

Research Of Density-based Clustering Algorithm By KNN

Posted on:2020-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:X F WenFull Text:PDF
GTID:2428330578956089Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,the distribution of data points has become more complicated as the size of dataset increases.Datasets containing clusters with arbitrary shapes,various densities,and different sizes are widely available in many applications.Traditional clustering algorithms cannot effectively identify cluster structures in these complicated datasets.When the-state-ofthe art clustering algorithms detect arbitrary clusters in various datasets,their results usually are not enough effective and efficient.Therefore,effective and efficient detection of clusters with arbitrary shapes,various densities,and different sizes from a complex dataset is one of the key issues to be solved at present,and it is also a research hotspot of clustering.The k-nearest neighbors method is a classification algorithm which takes into account the characteristics of data points and the spatial location where they are located,such as the distribution of data points.Generally,in density-based methods,the density can be determined by the tightness of the data distribution,and the density-based methods is also suitable for detecting clusters with arbitrary shapes from a dataset.Therefore,by studying of k-nearest neighbors method and the density-based clustering algorithms,two different clustering algorithms which are CUDG and CLDB based on k-nearest neighbors are proposed.(1)The CUDG algorithm defines the density gravitation between two data points by treating each data point as a particle in nature.First,the local density of a point is obtained according to the distribution of its surrounding neighbors.Then each data point is iteratively assigned to its adjacent point whose density is larger than and closest to it.The initial clusters which sharing same points are merged to obtain the final cluster.In this paper experiments,CUDG and six comparison algorithms are tested on twelve different dimensions and different types of datasets.The results show that the clustering performance of CUDG is better than the compared algorithms.(2)The CLDB algorithm defines a function to calculate the local density.It first effectively gets the basic structures of clusters by density backbone that formed by combining data points whose density is larger than all their mutual neighbors and their mutual neighbors.Then assigns each point to its nearest neighbor with greater density than it to achieve the final division of the dataset.To verify the performance of the algorithm,twelve datasets with class labels and two datasets without class labels are used as its benchmarks.The same comparison algorithm as CUDG are used as its baseline.The results show that the algorithm performance outperforms the other compared algorithms.Both algorithms can effectively identify the real cluster structure on different types of datasets.Among them,CUDG realizes the division of datasets by the force between two points,and CLDB identify different types of clusters in datasets by high-density points with a density greater than their mutual neighbors attract their mutual neighbors to form cluster backbones.The time complexities of the two algorithms are close to O(n? logn).
Keywords/Search Tags:Clustering Analysis, Arbitrary Clusters, Density Gravitation, Local Density, Density Backbone
PDF Full Text Request
Related items