Font Size: a A A

Non-negative Sparse Signal Analysis Theory And Text Clustering Applications

Posted on:2007-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:C F YangFull Text:PDF
GTID:2208360185456067Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Document clustering techniques have been received more and more attentions as a fundamental and enabling tool for efficient organization, navigation, retrieval, and summarization of huge volumes of text documents. The aim of document clustering is to cluster the documents into different semantic classes in an unsupervised manner. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. So the primary step in document clustering is to project the document into a lower-rank semantic space in which the documents related to the same semantics are close to each other. Different from other rank reduction methods, such as PCA (Principal Component Analysis) and VQ (Vector Quantization), NMF (Nonnegative Matrix Factorization) can get nonnegative, sparse basis vectors which make possible of the concept of a parts-based representation. If a document is viewed as a combination of basis topics, and every basis topic is represented by a related vector, then it can be categorized as belonging to the topic represented by its principal vector. Thus, NMF can be used to organize document collections into partitioned structures or clusters directly derived from the nonnegative factors.In this thesis, we mainly use SNMF (Sparse Nonnegative Matrix Factorization) as the method of rank reduction, which extend the NMF to include the option to control sparseness explicitly. At the same time, we optimize the method by initializing the SNMF with the Spherical k-means and the NNLS (Non-Negative Least Squares). By combination of SNMF and the method LPI (Locality Preserving Indexing), we get better performance in clustering accuracy. Different from previous document clustering method based on NMF, our methods try to discover both the geometric and discriminating structures of the document space in an unsupervised manner, companied with high accuracy in acceptable computationally expensive. Our experimental evaluations show that our methods surpass the NMF not only in the easy and reliable derivation of document clustering results, but also in document clustering accuracies. At last, we proposed a novel method named iNMF (increment NMF) in...
Keywords/Search Tags:SNMF, Document clustering, Spherical k-means, LPI, Increment NMF
PDF Full Text Request
Related items