Font Size: a A A

Study On Improved Adaptive Spectral Clustering Algorithm Based On Natural Neighbor

Posted on:2018-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:P P FuFull Text:PDF
GTID:2348330533461358Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Clustering analysis is one of the hotspots in the current research.It is mainly to dividing the whole data set into clusters.The similarity between members from the same cluster is high,and the similarity between clusters is low.Through similar or dissimilarity analysis,mining potential useful business information.Now it is widely used in the field,such as statistics,bioengineering,social science,medical information processing.The clustering analysis method is various and is classified according to the idea.It can be divided into the following categories approximately: division clustering,hierarchical clustering,density clustering,mesh clustering and model clustering.Most of the classical clustering algorithms are ideal for clustering on convex data sets,but they are ineffective on non-convex data sets.The spectral clustering algorithm can avoid this problem.In any shape Data sets it can converge to global optimizations.In addition,it can handle high-dimensional data sets and it have been widely concerned by so many scholars.The spectral clustering algorithm is based on the algorithm of graph theory.According to the principle of different division,many classification criteria are proposed,which are the minimum cut set criterion,the ratio cut set criterion,the rule cut set criterion,the maximum minimum cut set criterion and so on.It is an NP problem to find the optimal solution of the criterion function.Therefore,it is relaxed to solve the spectral decomposition of Laplacian matrices.The spectral decomposition results correspond to the optimal solution.According to the different Laplacian matrices,the clustering steps are also slightly different.However,according to the experiments,it is proved that the clustering effect based on the matrix Lsym is the best,and the representative algorithm is the NJW algorithm.Therefore,A lot of research about improvement are on the basis after that.The Self-Tuning algorithm is a classic improvement.The main idea of this algorithm is replaceing the kernel parameters in the Gaussian function with the K-nearest neighbor distance,and specify a kernel parameter for each point,which can weaken the kernel parameters.it has achieved good results in some field.However,the algorithm is not ideal for multi-scale data sets,but also requires access to parameter values K and clustering numbers.Aiming at these problems,a lot of density-based improved algorithms are proposed.A representative idea is that the main idea is to introduce density information into similar matrix constructs to achieve a more perfect similarity measure,but the algorithm still needs Enter the value of K and the number of clusters.Natural neighborhood is a new concept of neighbor,the typical representative of the traditional neighbor is K-Nearest Neighbor and ?-Nearest Neighbor,both of which require input parameters by human,one is the positive integer K in the K-nearest neighbor,and the other Distance value ?,and in different data set the difference for parameter selecting is large.But in the natural neighborhood,without entering any parameters,automatically finding the distribution of each data points and the number of natural neighbors,and get a similar to the K value of the natural feature is supk,which can be used to solve their parameters in many algorithms.A natural neighbor adaptive spectral clustering algorithm is proposed.Aiming at the shortcomings of the current spectral clustering algorithm,this algorithm applies natural neighborhood to spectral clustering algorithm.Firstly,the natural neighbor number nb,the natural eigenvalue supk,the supk neighborhood set NN of each data point,and the natural neighbor set RNN of each data point are obtained by natural neighbor algorithm.And then using this information improve the structure of similar matrix,whiich weaking the impact of nuclear parameters on the clustering effect;and using near-neighbor propagation information expand in the whole neighbor,to get a stable number of clusters.After running the spectrum clustering algorithm,and obtaining the clustering results.In experimental,the algorithm of this paper is compared with NJW,Self-Tuning and IASCBDA algorithm.The suitable parameters are entered in three kinds of algorithms,The clustering effect of NN-ASP is the most ideal.A natural de-noising algorithm(NN-NR)is proposed.In order to solve the problem that the current adaptive spectral clustering algorithm is ineffective in the sample space with noise points,the algorithm defines a new density concept,which is the ratio of nb and the sum of the distance of the nearest neighbor.It represents a local Density,which is similar to the local reachability from LOF.According to the LOF algorithm,we can get NLOF value of every points,and draw the sorted NLOF curve,Then,we can find the demarcation point and remove the noise point.The experimental is validated on six artificial data sets,and the NN-NR algorithm is compared with LOF algorithm.The NN-NR algorithm is not only superior to LOF,but also faster in time.
Keywords/Search Tags:Spectral Clustering, Natural Neighbor, Adaptive, Noise Removal
PDF Full Text Request
Related items