Font Size: a A A

Research On Text Clustering Algorithm Based On Spectral Clustering

Posted on:2016-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:J W ZhangFull Text:PDF
GTID:2308330479955436Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text clustering belongs to unsupervised machine learning method, now it is a large research topic in the field of Natural Language Processing, it has become a necessary step for the text information to effective organize, select abstract and navigate. Spectral clustering is one of clustering algorithm, now it is studied by more and more people, the application is very extensive. Spectral theory of graph partition is to establish the theoretical basis of spectral clustering algorithm, comparing with k-means, EM algorithm and other traditional clustering analysis algorithm, spectral algorithm can identify the convex shape of the distribution of sample data. The spectral clustering algorithm can be used in the sample space of arbitrary shape and distribution, and can get the global optimal solution.This paper describes in detail the contents of the key technology of text clustering, the basic theory of spectral clustering method and the classical spectral clustering algorithm, On the basis of in-depth research and spectral clustering algorithm related literature at home and abroad, to create a similarity matrix as an improvement to the traditional spectral clustering algorithm improvement program.When the traditional spectral clustering algorithm constructs similarity matrix, almost use text similarity based on the distance measure. This paper analyzes the defect of this method, the similarity measure is proposed based on K neighbor, and bring it into the spectral clustering algorithms, KNNSC algorithm is put forward. In view of the traditional spectral clustering algorithm is sensitive to the input data sequence, in this paper, by using particle swarm optimization algorithm to replace spectral clustering algorithm last step k-means clustering, the particle swarm optimization algorithm is introduced into KNNSC algorithm, PSO-KNNSC algorithm is put forward. The experimental part of this paper firstly completed the corpus selection, text preprocessing, text feature selection, construction of text vector space representation model, and so on. Then respectively using the k-means algorithm, KNNSC algorithm and PSO-KNNSC algorithm for clustering experiment many times, the experimental results show that the improved algorithm is effective and can get better clustering effect.
Keywords/Search Tags:Text Clustering, Spectral Clustering, Similarity Measure, K-Nearest Neighbor, Particle Swarm Optimization Algorithm
PDF Full Text Request
Related items