Font Size: a A A

Research Of Clustering Algorithms For Detecting Arbitrary Clusters Based On K Nearest Neighbors

Posted on:2023-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z S ZhangFull Text:PDF
GTID:2568306848477524Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cluster analysis is a core technology in machine learning and data mining.It has been applied in many fields such as pattern recognition,image analysis,data statistics and analysis.Cluster analysis is the process of partitioning a set of data objects into subsets.Each subset is a cluster,such that objects in a cluster are similar to one another other,yet dissimilar to objects in other clusters.With the rapid development and wide application of technologies such as internet of things,artificial intelligence and cloud computing,the amount of data is constantly increasing,and a large number of datasets with complex distribution are also generated.These datasets contain clusters of arbitrary shape and density.These clusters with complex distribution bring new challenges to clustering analysis algorithm.This thesis studies the clustering analysis algorithm based on the k-nearest neighbor method for datasets containing clusters of arbitrary shape and density,and proposes kNNC and DCTC algorithms.The specific research results are as follows:(1)kNNC algorithm is based on the idea that the two nearest points should be divided into the same cluster,and the two closely related clusters can be combined into one cluster,to find clusters of arbitrary shape and density.Firstly,a new data points similarity measurement method is proposed according to the k nearest neighbor relation of data points.This similarity measurement method can adapt to the density change of data points in the datasets.Then,for the data points pair,if the number of common neighbors between them is greater than the given threshold,the data points pair is added to the same cluster to form the initial cluster.Then,if the data points in the two initial clusters have more common neighbors,the two clusters are combined into the same cluster.In this process,if the number of common neighbors between a data point and other data points is less than the given threshold,it is identified as a noise point.In the experiment,the kNNC algorithm was compared with 2classical clustering algorithms and 4 new excellent clustering algorithms on 13 datasets containing arbitrary density and arbitrary shape clusters and 6 multi-dimension datasets.The results show that the kNNC algorithm can find clusters of arbitrary shape and density quickly and efficiently,and can recognize noise points.(2)DCTC algorithm is a clustering algorithm of discovering clusters trunk based on k nearest neighbor.The algorithm first defines the local density of data points by using the relation between the k nearest neighbor and the inverse k nearest neighbor.This density eliminates the influence of great density difference of data points among different clusters in the dataset on clustering results.Then,according to the local density of the data points,the data points are divided into core region points,boundary region points and noise region points.This method eliminates the influence of clusters with arbitrary shape,arbitrary density distribution and clusters with multiple centers on the clustering results.Then,the points in the core region are clustered to form the cluster backbone.Finally,the points in the boundary region are allocated to the clusters closely related to it to construct the final clusters.The DCTC algorithm was compared with 2 classical clustering algorithms and 4 new clustering algorithms on 13 datasets containing arbitrary density and arbitrary shape clusters and 5 multi-dimension datasets.The results show that DCTC algorithm has better performance in detecting arbitrary shape and arbitrary density clusters,and has better performance in detecting unbalanced dataset and multi-centers dataset.The two clustering algorithms proposed in this study can accurately find clusters of arbitrary shape and arbitrary density in complex distributed datasets,and have low time complexity,which are O(n·logn).
Keywords/Search Tags:Clustering Analysis, k-Nearest Neighbor, Density of Data Points, Clusters of the Trunk, Arbitrary Clusters
PDF Full Text Request
Related items