Non-negative Sparse Signal Analysis Theory And Text Clustering Applications

Posted on:2007-12-23

Degree:Master

Type:Thesis

Country:China

Candidate:C F Yang

Full Text:PDF

GTID:2208360185456067

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Document clustering techniques have been received more and more attentions as a fundamental and enabling tool for efficient organization, navigation, retrieval, and summarization of huge volumes of text documents. The aim of document clustering is to cluster the documents into different semantic classes in an unsupervised manner. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. So the primary step in document clustering is to project the document into a lower-rank semantic space in which the documents related to the same semantics are close to each other. Different from other rank reduction methods, such as PCA (Principal Component Analysis) and VQ (Vector Quantization), NMF (Nonnegative Matrix Factorization) can get nonnegative, sparse basis vectors which make possible of the concept of a parts-based representation. If a document is viewed as a combination of basis topics, and every basis topic is represented by a related vector, then it can be categorized as belonging to the topic represented by its principal vector. Thus, NMF can be used to organize document collections into partitioned structures or clusters directly derived from the nonnegative factors.In this thesis, we mainly use SNMF (Sparse Nonnegative Matrix Factorization) as the method of rank reduction, which extend the NMF to include the option to control sparseness explicitly. At the same time, we optimize the method by initializing the SNMF with the Spherical k-means and the NNLS (Non-Negative Least Squares). By combination of SNMF and the method LPI (Locality Preserving Indexing), we get better performance in clustering accuracy. Different from previous document clustering method based on NMF, our methods try to discover both the geometric and discriminating structures of the document space in an unsupervised manner, companied with high accuracy in acceptable computationally expensive. Our experimental evaluations show that our methods surpass the NMF not only in the easy and reliable derivation of document clustering results, but also in document clustering accuracies. At last, we proposed a novel method named iNMF (increment NMF) in...

Keywords/Search Tags:

SNMF, Document clustering, Spherical k-means, LPI, Increment NMF

PDF Full Text Request

Related items

1	Based On K-means The Chinese Text Clustering Algorithm
2	Research On Document Clustering Algorithm Based On K-means
3	Study of document clustering using the k-means algorithm
4	Multi-document Summarization Based On Improved Fuzzy C-means Clustering Algorithm
5	A Distributed Indexing Method Of Large Scale Document Set Based On Clustering
6	Research Of Document Clustering For User Interest
7	Document Topic Clustering Analysis Based On Improved K-means Method
8	FCM Clustering And Research Of Its Increment Algorithm
9	Research On Efficient Document Clustering Using Improvised Sub-Document Based Framework
10	A Study Of Chinese Multi-document Summarization Based On Adaptive Clustering Algorithm