Font Size: a A A

Spectral Clustering Algorithms And Its Application In Text Clustering

Posted on:2014-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:L J SunFull Text:PDF
GTID:2248330395483046Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Spectral clustering as a class of new and effective clustering method has been the major research in the field of machine learning. Compared with traditional clustering algorithms, spectral clustering algorithm can deal with clustering problem with an arbitrary shaped sample space and converge to the global optimal solution. Based on research on spectral clustering, this paper improves the algorithm and implements the system of Chinese text clustering.The main research results are as following:(1) Since discretization as a pretreatment step makes a great contribution to data mining, it is combined with density-sensitive spectral clustering. Firstly normalize and discretize the original data, then measure the distance between the data points through Hamming distance, and construct density sensitive similarity measure that is introduced to spectral clustering, thus DSSCCAT algorithm is proposed. Experimental results show the feasibility of combining discretization with density-sensitive spectral clustering.(2)To alleviate sensitivity to initialization in DSSCCAT algorithm, SCO-DSSCCAT algorithm is proposed, which searches the global optimal solution in the clustering space as the initialization via SCO algorithm. Experimental results show the advantages of SCO-DSSCCAT algorithm, such as no sensitivity to the scale parameter, fast convergence rate and high clustering stability.(3)For the computational cost brought by density sensitive similarity measure, Two-Phase Clustering (TPC) algorithm is proposed. Firstly, use fast global k-means algorithm to construct the representatives of the original dataset. Then, utilize SCO-DSSCCAT algorithm to cluster the representatives. Finally, obtain final results by combining the clustering results of these two phases. Experimental results show that TPC algorithm has a better performance on accuracy and efficiency of clustering.In the last part, the system of Chinese text clustering is realized on basis of the improved spectral clustering algorithms. Experimental results show the effectiveness of SCO-DSSCCAT algorithm and TPC algorithm on text clustering.
Keywords/Search Tags:spectral clustering, text clustering, discretization, similarity measure, stem cellsoptimization
PDF Full Text Request
Related items