Font Size: a A A

Research On Constrained Spectral Clustering Algorithms

Posted on:2020-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2428330590452091Subject:Software Engineering Technology
Abstract/Summary:PDF Full Text Request
The spectral clustering algorithm is a machine-learning algorithm based on the theory of spectral partitioning.It is based on the theory of spectral graphs.Its essence is to transform the clustering problem into the optimal partitioning problem of graphs.Compared with the traditional clustering algorithm,it has the advantage of being able to cluster on the sample space of arbitrary shape and converge to the global optimal solution,and can obtain the global optimal solution in the relaxed continuous domain.However,some shortcomings of traditional spectral clustering algorithms cannot be ignored.First,it is difficult to correctly find clusters with large differences in density.The selection of parameters depends on multiple experiments and personal experience.Second,in the traditional spectral clustering algorithm,whether it is a parametersensitive problem or a difficult number of clusters,it will lead to poor clustering robustness.In the clustering process,the researchers determine the numerical values of the parameters and the number of clusters after many years of experiments and multiple attempts.Different parameters or different cluster numbers may make the clustering results greatly different.In addition,in the traditional constrained clustering algorithm,the random query strategy will bring the instability of the clustering results.In this thesis,the spectral clustering algorithm is studied for these problems.The specific contents are as follows:To solve the problem that spectral clustering algorithm has poor clustering effect for clusters with large density difference,this thesis establishes a pair of constraint groups,namely must-link and cannot link,by adding the concept of pair constraint to guide the spectral clustering process,and this constraint group is used to describe the relationship between two samples.In the constraint group,must-link means that two samples belong to the same cluster,while cannot-link means that these two samples belong to different clusters.Combined with the thought of such a semi-supervised Clustering,this article through to the traditional Spectral Clustering algorithm to improve a pair constraint Spectral Clustering algorithm Based on Shared neighbor PCSC-SN(Pairwise Constrained Spectral Clustering based on Shared on Neighborhood).PCSC-SN algorithm uses Shared neighbor to measure the similarity between data pairs,and uses active constraint information to find the relationship between two data points.Experimental results on artificial datasets show that this algorithm can achieve better clustering results.In order to solve the problem of parameter sensitivity,difficulty in determining the number of clusters and instability caused by random query strategy in constrained spectral clustering algorithm,an active constrained spectral clustering ACSCHM algorithm based on Hessian matrix is proposed in this thesis.The algorithm uses Hessian matrix instead of Laplacian matrix to solve the problem of parameter sensitivity.The number of eigenvectors of eigenvalues is used to estimate the number of clusters,which effectively solves the problem that the number of clusters is difficult to determine.In this thesis,active query strategy is used to replace the original random query strategy,which overcomes the instability of clustering results caused by random query and enhances the robustness of the algorithm.The experimental results show that this algorithm can effectively improve the clustering accuracy.
Keywords/Search Tags:Semi-supervised clustering, spectral clustering, Bethe Hessian matrix, pairwise constraints, shared neighbors
PDF Full Text Request
Related items