Font Size: a A A

Research On Spectral Clustering With Improved Similarity Measure

Posted on:2013-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2248330371997457Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
Cluster analysis is one of the important research contents in data mining and pattern recognition. Spectral clustering algorithm, based on the spectral graph theory of graph theory, is one of the major research in the field of machine learning. Compared with the traditional clustering algorithms, such as K-means algorithm, EM algorithm, spectral clustering algorithm has many advantages. It is not easy to fall into local optimal solution, and can identify non convex distribution clustering structures to deal with clustering problem with an arbitrary sample space.Although spectral clustering is a very competitive clustering algorithm, it is still in the initial research phase. The similarity measure of spectral clustering algorithm is one of the research hotspots. Since spectral clustering are very sensitive to Gauss kernel parameter, most of the studies focus on the selection problem of Gauss nuclear parameter at present. This article, from the aspect of the nature in data sample space itself, presents a similarity measure reflecting the global consistent feature of data. Spectral clustering algorithm applying the proposed similarity measure, on the one hand, can avoid the sensitivity problem of nuclear parameter on traditional spectral clustering. On the other hand, it can solve the multi-scale clustering problem which the traditional spectral clustering algorithm can’t solve. The algorithm is especially effective in handling the clustering problem with distinguished density. Our study also found that this similarity measure, combined with pairwise constraints information, can extend prior information. Experiments show that semi-supervised spectral clustering algorithm with a small number of pairwise constraints information, based on the proposed similarity measure, can obtain very good clustering results.In Chinese text mining, for text data with features of high dimension and sparsity, many clustering algorithms are not suitable for text clustering. Spectral clustering algorithm, only relating to the number of samples and independent of the dimension, can avoid the singularity problem caused by high dimensional feature vector. In this paper, spectral clustering algorithm is applied in Chinese text mining. Latent semantic analysis method is used to reduce dimensionality of the text vector. And we analyzed the effects of different dimension on spectral clustering algorithm.According to above discussion, the main work of this paper is as follows: (1) according to the nature of data sample space itself, we proposed a new similarity measure reflecting consistency characteristic of data clustering, and applies it to the spectral clustering algorithm;(2) we applied the new similarity measure combining with pairwise constraints prior information to semi-supervised spectral clustering algorithm;(3) study the application of spectral clustering algorithm in Chinese text mining;...
Keywords/Search Tags:Spectral clustering, Similarity measure, Text clustering
PDF Full Text Request
Related items