Font Size: a A A

Semi-supervised Subspace Clustering Based On Space-level Constraint

Posted on:2010-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y QiuFull Text:PDF
GTID:2178360302460833Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cluster analysis with a wide range of applications is very important for data mining. According to the large number of different applications, clustering algorithms are divided into four broad categories: partition method, hierarchical methods, grid-based methods, density-based methods. At present, how to deal with large-scale high-dimensional data set is one of the hot and difficult issues in data mining. Due to the high-dimensional data are sparse, the traditional clustering algorithm is often unable to obtain the desired results in dealing with such data.Subspace clustering algorithm is a new clustering technique, which is proposed for high-dimensional data sets. It is an extension of the traditional clustering in high dimensional dataset, its main idea is to localize the search in the relevant dimension to cluster search. Representative algorithms include CLIQUE, PROCLUS and ORCLUS etc. However, a better solution of dimension selection is still not available which can obtain satisfying results.In order to deal with such problem caused by high dimensional dataset, we use semi-supervised learning method to solve problem. We proposed a new semi-supervised subspace clustering by using domain knowledge, which is ignored by traditional subspace clustering. Our algorithm focuses on the form of constraints. In one hand, our algorithm uses inconsistent constraints to find the subspace search direction, so it can reduce the time of selecting the related dimensions and improve the accuracy of selecting. In other hand, our algorithm use the constraint points to form the cluster centroids. It can improve the efficiency of clustering. In addition, as our algorithm uses the constraint to select dimensions, it can not only maintain the advantages of subspace clustering algorithms, but also avoid to the weak point of given parameters.We test our algorithm on both synthetic and real datasets. The experimental results show that our algorithm can perform well on high dimensional dataset and is more efficient than FINDIT, PROCLUS and ORCLUS.
Keywords/Search Tags:Semi-supervised Learning, Clustering, Subspace Clustering, Inconsistent Constraints, High Dimensional Dataset
PDF Full Text Request
Related items