Research Of Text Clustering Based On NMF Algorithm

Posted on:2015-09-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Ju

Full Text:PDF

GTID:2298330422987405

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As one of the most important research topics in data mining and patternrecognition, clustering analysis has been widely used in areas such as datacompressing, document clustering, information retrieval, image segmentation, etc.Inrecent years, with the rapid growth of on-line information, the document clustering isbecoming more and more important in the field of information retrieval and memorymanagement and so on.Text data with high dimensionality and sparseness, etc., whichmakes many clustering algorithms cannot be directly used for text clustering; addition,the massive scale of the text sets imposes high efficiency on clustering algorithms.Vector space model is the traditional model for representing text documents asvectors. Due to the high-dimensional, sparse features of document, NMF algorithmwill be used in this article.NMF is a new method for feature extraction. Because thenon-negative limitation for the results of factorization, the features based on NMFreflect more localized characteristics of the samples. Therefore, the feature vectorsextracted by NMF are easier to explain and forecast.This thesis introduces the basic ideas and basic algorithms of non-negativematrix factorization, due to the non-negative matrix factorization algorithm convergesslower, slowly and tends to converge to poor solution.Therefore, NMF algorithm hasbeen improved in this paper, using FCM algorithm to initialize. Secondly, due to thelarge size of the text, clustering algorithm requires even more stringent, standardk-means algorithm needs to be calculated the distance from each sample point to allcluster centers in each iteration. Which waste of a lot of computation time, especiallywhen a particularly large amount of data, for this problem this paper proposes animproved k-means algorithm. As many clustering algorithms require the number ofclusters before clustering, which does not know in advance, for which a newclustering algorithm FGClus proposed in this paper. Experiments show that theimproved k-means algorithm and the proposed FGClus algorithm are effective.Finally, this article will use NMF and improved NMF integrated with k-meansalgorithm, the improved k-means algorithm and the proposed algorithm ofFGClus,the experimental results showed NMF get integrated with the clusteringalgorithm are superior to the direct use of clustering algorithm for high dimensionalsparse text vector and improved NMF algorithm can not only produce more accurateclustering results, but also improves the efficiency of the algorithm.

Keywords/Search Tags:

Text clustering, non-negative matrix factorization, clustering analysis, text cluster integration, k-means algorithm

PDF Full Text Request

Related items

1	Research On Key Techniques In Text Mining
2	Research And Implementation Of Text Clustering Based On Dk-means
3	Research And Implementation Of Text Clustering Based On DK-Means
4	Research On Recommendation System Based On Non-negative Matrix Factorization And Clustering Algorithm
5	Cluster Analysis Application And Research Of Text Mining
6	Design And Implementation Of Distributed Text Clustering System Based On K-means
7	The Research And Application Of Text Clustering Based On Improved K-means Algorithm
8	Research On Clustering Algorithm Based On Density Peak And Its Application In Text Clustering
9	Clustering Algorithm Based On Robust Non-negative Matrix Factorization
10	Research And Implementation Of Text Clustering Based On Fuzzy C-Means Clustering Algorithm