Font Size: a A A

Study On Clustering Algorithm Based On Natural Neighbor Graph And Internal Evaluation Index Of Noise Removal

Posted on:2022-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:L T LiFull Text:PDF
GTID:2518306536475494Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the scale of the data is more and more big,the structure of the data is more complex,and then to the class data collection is becoming more and more convenient,data sets of large-scale and complex category label is very difficult,how from complex data mining without a label out valuable information become the research focus of unsupervised learning.Clustering analysis is considered to be the most important technology in the field of machine learning,and is widely used in image segmentation,pattern recognition,document clustering,social networking and other fields.Clustering,as an unsupervised learning task,needs to verify the quality of segmentation results,otherwise it is difficult to use different clustering results.Therefore,the clustering verification index is very important in the field of clustering analysis to evaluate and verify the clustering results.Therefore,the internal clustering index,which does not need external information,has become a research hotspot.The traditional partition-based clustering algorithm is not suitable for clusters of arbitrary shape,but the traditional density-based clustering algorithm is very sensitive to parameter setting.Graph-based clustering algorithm has always been a research hotspot in the field of clustering analysis because it can deal with data sets with complex shapes.The traditional graph-based clustering algorithm,MST clustering algorithm and spectral clustering algorithm are the most representative of graph-based clustering algorithm,but these algorithms are extremely sensitive to noise data.In order to solve these problems,this thesis proposes a new point-cutting clustering algorithm,Cut PC,which is based on natural neighborhood graph and noise cutting.Firstly,a natural neighborhood graph is constructed based on the concept of natural neighbors,and the negative density and critical negative density are defined by the natural neighborhood graph.Then,the noise points were obtained by comparing the critical inverse density with the inverse density,and the noise was cut in the natural neighborhood graph.Finally,the natural neighbor search algorithm is used to obtain the final clustering results.Experiments show that the proposed Cut PC algorithm can deal with arbitrary data sets and identify noise without setting any parameters.Aiming at the problem that the traditional internal clustering index can't adapt to the arbitrary shape cluster and is very sensitive to noise,this thesis proposes an internal evaluation index NRCVI based on hierarchical clustering and noise removal.Firstly,noise points are obtained based on the inverse density and critical density defined by natural neighborhood graph,and a new data set is constructed by removing noise points from the original data set.Then,hierarchical clustering was used to construct hierarchical clustering tree in the new data set after removing noise points,and the intra-cluster compactness and inter-cluster separation degree of a single cluster were defined.On this basis,the internal clustering evaluation index NRCVI of a single cluster was obtained and the mean value of the whole data set avg NRCVI was calculated.Finally,based on the proposed index,an optimal NC determination method based on NRCVI is designed.The experimental results show that the internal evaluation index proposed in this thesis can adapt to any cluster shape and is insensitive to noise.
Keywords/Search Tags:Clustering, Natural neighborhood graph, Noise cutting, Internal evaluation index
PDF Full Text Request
Related items