Clustering is the most commonly used unsupervised machine learning algorithm and is an important way to perform data mining and machine learning.The goal of clustering algorithm is to divide the data set into different clusters according to some division,so that the data within the same cluster has a high similarity and the data between different clusters has a low similarity.Currently clustering algorithms have been applied in various fields in the era of artificial intelligence.With the development of society,in practical application scenarios,the diversity of clustered data makes many current clustering algorithms unable to achieve better results.For example,most current clustering algorithms cannot fit noisy data sets with complex shapes such as non-convex,and many clustering algorithms need to artificially determine the optimal number of clusters for a data set.These two difficulties are also urgent problems that many researchers in clustering want to solve.In this thesis,two new clustering algorithms are proposed to address the two shortcomings of the above listed clustering algorithms,namely,the clustering algorithm based on reverse nearest neighbor construction of connected graph and the improved hierarchical clustering algorithm based on natural nearest neighbor density denoising.To address the shortcoming that most current clustering algorithms cannot fit complex structured noisy data sets such as non-convex,this thesis proposes a clustering algorithm based on reverse nearest neighbor to construct a connected graph.In order to reduce the influence of noise on the clustering process and improve the robustness of the algorithm,firstly,a density formula is designed to calculate the density of data points based on the maximum number of reverse neighbors of data points obtained from the natural neighbor search process,and then a dynamic noise discriminator is constructed to denoise the data set.Secondly,we find the number of reverse neighbors of all data points in the denoised data set through the natural neighbor search process,construct a reverse nearest neighbor connectivity graph to identify the internal structural features of the data set using the number of reverse neighbors of data points as the restriction,and perform cluster merging according to the given number of clusters to obtain the clustering results of the denoised data set;finally,we divide the noisy points to obtain the final clustering results.The method is applied to 9 synthetic datasets and 5 real datasets and compared with the results of other 5 clustering algorithms,and the experiments show that the method outperforms the comparison algorithms in clustering on 14 datasets.To address the shortcoming that most current clustering algorithms cannot automatically identify the number of clusters,an improved hierarchical clustering algorithm based on natural nearest neighbor density denoising is proposed in this thesis.In order to reduce the influence of noise,firstly,the data set is denoised,and the internal structural features can be retained after denoising;secondly,the k-nearest neighbor directed graph is constructed for the denoised data,and the graph merging process is performed from small to large iterative directed edges to divide the clusters,and at the same time,a strategy is designed to judge the best number of clusters to obtain the best clustering result of the data set after denoising.Finally,the noisy points are divided into clusters to obtain the final clustering results.To verify the effectiveness of the proposed method,the method is applied to 12 synthetic datasets and 6 real datasets for experiments,and compared with other five clustering algorithms.The experiments show that the method can not only accurately identify the best clustering number of datasets,but also get better clustering results. |