Font Size: a A A

Research On Key Technologies Of Co-Cluster And Co-Clustering Ensemble

Posted on:2016-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:S D HuangFull Text:PDF
GTID:2308330461470236Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Clustering is an important method in the fields of machine learning and data mining. Clustering aims to divide the data points into the classes according to their similarity so that the similarity of data points from the same cluster is larger than that from the different cluster. The goal of co-clustering is to simultaneously cluster the rows and columns of an input data matrix. It overcomes several limitations associated with traditional clustering methods by allowing automatic discovery of similarity based on a subset of attributes so that a better clustering result can be obtained.In this paper, we present a robust graph regularized nonnegative matrix factorization (RGNMF) for clustering. It imposes a sparse error matrix into the data reconstruction function and applies the l1-norm to measure graph regularization errors. The error matrix alleviates the impact of outliers and noises, and the l1-norm reduces the influence of unreliable graph regularization errors. Accordingly, RGNMF is expected to obtain robust regularization, and achieve a more faithful approximation to the data recovered from the sparse outliers. An iterative updating algorithm is proposed to solve the optimization problem of RGNMF, and its convergence is also guaranteed theoretically.Different co-clustering models usually produce very distinct results since each algorithm has its own bias due to the optimization of different criteria. The idea of combining different co-clustering results emerged as an alternative approach for improving the performance of co-clustering algorithms. Similar to clustering ensembles, co-clustering ensembles provide a framework for combining multiple base co-clusterings of a dataset to generate a stable and robust consensus co-clustering result. In this paper, a novel co-clustering ensemble algorithm named spectral co-clustering ensemble (SCCE) is presented. SCCE performs ensemble tasks on base row clusters and column clusters of a dataset simultaneously, and obtains an optimization co-clustering result. Moreover, the results indicate that SCCE can achieve a better co-clustering performance with a smaller computational cost. Meanwhile, SCCE is a matrix decomposition based approach which can be formulated as a bipartite graph partition problem and solve it efficiently with the selected eigenvectors. A novel co-clustering ensemble objective function is presented by making use of the similarity between the base co-clusterings. Besides, a novel model based on spectral algorithm is presented for the co-clustering ensemble, and the inference of SCCE is illustrated in detail.Finally, we perform our experiments on a couple of real-world datasets including UCI datasets, Microsoft Research Asia Multimedia (MSRA-MM) image dataset and document datasets. Experiments on these benchmark data sets demonstrate the effectiveness of RGNMF. Experiments on most of these benchmark datasets demonstrate that the SCCE outperforms other co-clustering methods as well as many co-clustering ensemble algorithms. Moreover, the results indicate that SCCE can achieve a better co-clustering performance with a smaller computational cost.
Keywords/Search Tags:Clustering, Co-clustering, Robust Graph Regularization, Spectral Co-clustering Ensemble
PDF Full Text Request
Related items