Font Size: a A A

Model-based Semi-supervised Subspace Clustering Algorithm Analysis

Posted on:2014-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y WuFull Text:PDF
GTID:2248330398950344Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cluster analysis is one of the key technologies in data mining, with a wide range of applications. With the development of science, high-dimensional data clustering draws more and more scientists’ concerns. Compared with traditional clustering algorithms, high-dimensional data clustering has more difficulties. Due to high-dimensional data are sparse, traditional clustering algorithms often do not get the desired results in dealing with such datasets. Subspace clustering is an effective way for clustering high-dimensional data, which is an extension of traditional clustering algorithm in high-dimensional data space, the idea is to search locally in the relevant subspaces.In high-dimensional data, different subsets of features are always relevant for different clusters. The problem of finding clusters in their relevant subspaces is called subspace clustering. This clustering is challenging since the search for subspaces and the detection of clusters are circular dependent. Existing approaches either use an enumeration of all possible subspaces with exponential computational complexity or rely on locality assumption which doesn’t hold in many situations. It seems impossible to find a method free from these two ways to break the dependence. In this paper, we introduce additional information in the form of pairwise constraint to break this dependence. Our proposed Model-based Semi-supervised Subspace Clustering algorithm(MSSC) fully integrates constraints information in searching the subspaces and developing a new optimization objective function and successfully breaks the circular dependency problem. The properties of this algorithm are investigated and the performance is evaluated experimentally using real and synthetic datasets. These results indicate that the incorporation of surprisingly few such constraints can increase clustering accuracy greatly, which perform much better than other state-of-the-art subspace clustering algorithms.
Keywords/Search Tags:Semi-Supervised Learning, Subspace Clustering, Inconsistent Constraints, Model-based
PDF Full Text Request
Related items