Clustering is a very effective method for data analysis, which has been widely used in the fields of machine learning, pattern recognition, data mining, etc. The goal of clustering is to divide dataset into a number of disjoint clusters by measuring the similarity between them, it requires that the data in the same cluster will possess high similarity, while the data in different clusters will have low similarity. Dividing the dataset correctly has attracted great interests of researchers. In the past few decades, a large number of clustering algorithms have been put forward. In recent years,many scholars have pointed out that many matrix-based learning algorithms are effective clustering methods. Matrix learning includes matrix factorization, matrix recovery, subspace clustering, etc.Generally, matrix learning algorithms learn the low-rank matrix representation of original data,then we can apply the low dimensional matrices to deal with clustering tasks.This thesis studies semi-supervised low-rank matrix learning and its applications. In order to improve the learning quality of matrix learning and obtain better clustering performance, unsupervised matrix learning algorithms have been extended to semi-supervised learning methods. Due to integrating a small portion priori information in the dataset, semi-supervised matrix learning models exhibit more excellent learning performance. This thesis has proposed semi-supervised nonnegative matrix factorization algorithms from two aspects of hard constraint and soft constraint, a constrained local coordinate factorization algorithm and a constrained concept factorization algorithm. Meanwhile, this thesis studies the clustering applications of the algorithms in natural image,face image, handwritten digit image and text.In this thesis, the main contributions are as follows:1. The thesis proposes a new semi-supervised non-negative matrix factorization model. Nonnegative matrix factorization(NMF) has been proved a very effective clustering method. To further enhance the performance of non-negative matrix factorization algorithm, this thesis proposes a novel semi-supervised non-negative matrix decomposition algorithm which incorporates graph Laplacian and label information of small samples into non-negative matrix factorization, and obtains semi-supervised non-negative matrix factorization algorithm. The clustering accuracy and normalized mutual information obtained by the proposed algorithm are better than other similar classical algorithms on natural image clustering experiments.2. This thesis presents a pairwise constrained non-negative matrix factorization with graph Laplacian method. Semi-supervised non-negative matrix factorization algorithm had incorporated label information of small samples as constrains, this constraint information can be called as hard constraint. In the clustering process, the dimensionality of the factorized matrices must be the same as the number of clusters. Since it can not set the dimensionality of the factorized matrices freely, it may result in bigger reconstruction error between the original matrix and the factorized matrices. In order to improve the applicability of the model and reduce the reconstruction error, this thesis incorporates pairwise constraints and graph Laplacian into NMF, and proposes a pairwise constrained non-negative matrix factorization with graph Laplacian method. Experimental results for image clustering show that this method obtains smaller reconstruction error, so the product of the factorized matrices will be a better approximation of the original matrix.3. This thesis proposes a constrained non-negative local coordinate factorization algorithm.Sparse representation can enhance the robustness and performance of algorithms, which attracts more and more attentions from researchers. In order to obtain sparse coefficient matrix, some enhanced sparsity regularization terms have been incorporated into non-negative matrix factorization algorithm. However, most of these sparse algorithms are unsupervised learning models, so this thesis proposes a constrained non-negative local coordinate factorization algorithm, this algorithm has considered the local geometry structure, the sparsity of coefficient matrix and priori information. Experimental results show the effectiveness of the proposed novel method in comparison to the state-of-the-art algorithms on image datasets.4. This thesis presents a constrained concept factorization algorithm. Concept factorization(CF) is a variant of non-negative matrix factorization. NMF is a linear learning method,when the data is linearly non-separable, its results are often less than ideal. CF can map the linearly non-separable data in the original space into linearly separable data in the transformed high-dimensional space with the kernel function. However, CF is an unsupervised learning method, which does not consider any priori knowledge of the data. In order to enhance the clustering performance of CF, this thesis carefully designs a new concept factorization objective function incorporating the pairwise constraints information into it, and also develops an optimization scheme for the objective function to derive the iterative updating rules, the computational complexity of the proposed algorithm is qualitatively analyzed,and the convergence proof of the algorithm is provided. Experimental evaluations on image and document datasets show that the proposed approach achieves good performance and outperforms other state-of-the-art methods. |