Exploring Data Clustering with Non-negative Matrix Factorization Models

Posted on:2016-08-21

Degree:Ph.D

Type:Thesis

University:Drexel University

Candidate:Xiong, Zunyan

Full Text:PDF

GTID:2478390017477758

Subject:Computer Science

Abstract/Summary:

The clustering problem has been widely studied in data mining and machine learning. It has numerous applications to pattern recognition, information retrieval, image analysis and bioinformatics, etc. In general, clustering is a fundamental unsupervised machine learning technique that aims to partition the data set based on their similarity. Recently there has been significant development in the use of non-negative matrix factorization (NMF) methods for various clustering tasks. The method finds two non-negative matrix whose product approximates the original matrix. The non-negativity of the factored matrices is superior to other matrix factorization methods because it makes the data interpretation much easier. Moreover, NMF has attracted much attention due to the newly discovered ability of solving challenging data mining and machine learning problems. Studies has proved that NMF is equivalent with kernel k-means and probabilistic latent semantic indexing under some circumstances. Compared to most other clustering methods, NMF has been proved to achieve better or similar clustering results.;In the thesis, our primary goal is to study the clustering problem by establishing NMF models reflecting the features of given data. First, in the case when the similarity of the data is available, we proposed two modified NMF models, one with a constraint (CNMF) and the other with a regularization term (RNMF). We take this situation as an example to show how to model the data information. Also, we compare the two commonly employed approach in this simple case. Next, we propose a novel model named augmented nonnegative matrix factorization (ANMF). The novelty of the model is that it incorporates the geometric closeness of the data on both dimensions of the data matrix. In addition to the experiments conducted on benchmark data sets, the model is also applied to real application, i.e. CiteUlike data set. Finally, for data sets with sparse features, we propose a new model named sparse regularized non-negative matrix factorization (SpaNMF). This type of data is ubiquitous in applications and has remained a hot topic for many years. Our novelty here is to combine the geometric structure and sparseness of the data. For all of the four models, we develop numerical algorithms and conduct the experiments. The results of the experiments show effectiveness of our proposed models compared with state-of-the-art clustering algorithms.

Keywords/Search Tags:

Clustering, Data, Matrix factorization, Model, Machine learning, NMF

Related items

1	Multi-View Clustering Based On Integrated Weight Learning And Non-Negative Matrix Factorization
2	Research On The Application Of Non-negative Matrix Factorization And AP Clustering Methods In Personalized Recommendation Systems
3	Nonnegative Matrix Factorization Algorithm Based On The Regularized Method And Its Applications
4	Study On Multi-label Learning Methods Based On Non-negative Matrix Factorization And Extreme Learning Machine
5	Non-Negative Collective Matrix Factorization Algorithm For Heterogeneous Co-Transfer Clustering
6	Algorithms For Structure-enforced Matrix Factorization With Applications
7	Manifold Regularized Matrix Factorization With Constrains And Its Applications In Image Clustering
8	Research On Subspace Learning Based Data Representation
9	Non-negative Matrix Factorization Based On Local Similarity Learning
10	Research On Structured Matrix Factorization Algorithm