Font Size: a A A

Research On Spectral Clustering Algorithm Based On Local Density

Posted on:2021-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:C Y YuanFull Text:PDF
GTID:2428330614958474Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As an important branch in data mining,Clustering can find useful knowledge from a lot of data under unsupervised circumstances,thus attracting extensive attention from scholars and generating a large number of theories and methods.As a kind of clustering algorithm,spectral clustering convert the division of the datasets to the division of the graph.Since the data samples are mapped to eigenvectors of the Laplacian matrix for clustering,the algorithm can recognize any shape cluster of sample data.An objective function of graph cut is taken in the traditional spectral clustering,even a small number of noise points may greatly affect the clustering quality.The spectral averagely-dense clustering takes the sum of the maximum average node degrees as the objective function to optimize all clusters,reducing the impact of noise points on the clustering results,and the clustering quality is stable.The spectral averagely-dense clustering maintains the advantages of the traditional spectral clustering and make up its shortcomings.The main research work of the thesis includes:1.Aiming at the problem that the neighbor parameter N needs to be set for the shared nearest neighbors method,the thesis proposes a dynamic shared nearest neighbors method to dynamically determine the neighbor parameter N.This method draws on the design idea of h-index and takes the intersection of the transposed the sample distribution function and the ascending curve to dynamically determine the parameter N,which avoids the blindness of manual selection.2.Aiming at the problem that the spectral averagely-density clustering is sensitive to the parameter ?,the thesis proposes a spectral averagely-dense clustering based on dynamic shared nearest neighbors.First,a self-adaptive distance measure is designed based on the Gaussian kernel.Because the self-adaptive distance measure considers the degree of sparsity around the data points,it can handle clusters of different densities.Then,this self-adaptive distance measure is combined with dynamic shared nearest neighbors method to construct a similarity measure based on dynamic shared nearest neighbors,and this similarity measure fully considers the local density around the data points.Finally,the ?-neighberhood gragh of the the spectral averagely-density clustering is replaced by the fully connected graph,and the similarity measure based on dynamic shared nearest neighbors is used to calculate the weight of fully connected graph,whitch avoids the setting of the parameter ?.3.Aiming at the problem of the influence of the number selection of clusters on the clustering results.The thesis proposes to apply the eigengap method to the improved the spectral averagely-density clustering.The perturbation theory is used to analyze the impact of perturbation on the eigenvalues from three situations and the relationship between the similarity measure and perturbation.Through the above analysis,it is found that the similarity measure based on dynamic shared nearest neighbors that the thesis proposes and can effectively reduce the perturbation,and it is very suitable for the eigengap method to determine the number of clusters.
Keywords/Search Tags:spectral averagely-dense clustering, fully connected gragh, shared nearest neighbors, sample distribution function, eigengap
PDF Full Text Request
Related items