Semi-supervised Subspace Clustering Based On Space-level Constraint

Posted on:2010-06-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y Qiu

Full Text:PDF

GTID:2178360302460833

Subject:Computer application technology

Abstract/Summary:

Cluster analysis with a wide range of applications is very important for data mining. According to the large number of different applications, clustering algorithms are divided into four broad categories: partition method, hierarchical methods, grid-based methods, density-based methods. At present, how to deal with large-scale high-dimensional data set is one of the hot and difficult issues in data mining. Due to the high-dimensional data are sparse, the traditional clustering algorithm is often unable to obtain the desired results in dealing with such data.Subspace clustering algorithm is a new clustering technique, which is proposed for high-dimensional data sets. It is an extension of the traditional clustering in high dimensional dataset, its main idea is to localize the search in the relevant dimension to cluster search. Representative algorithms include CLIQUE, PROCLUS and ORCLUS etc. However, a better solution of dimension selection is still not available which can obtain satisfying results.In order to deal with such problem caused by high dimensional dataset, we use semi-supervised learning method to solve problem. We proposed a new semi-supervised subspace clustering by using domain knowledge, which is ignored by traditional subspace clustering. Our algorithm focuses on the form of constraints. In one hand, our algorithm uses inconsistent constraints to find the subspace search direction, so it can reduce the time of selecting the related dimensions and improve the accuracy of selecting. In other hand, our algorithm use the constraint points to form the cluster centroids. It can improve the efficiency of clustering. In addition, as our algorithm uses the constraint to select dimensions, it can not only maintain the advantages of subspace clustering algorithms, but also avoid to the weak point of given parameters.We test our algorithm on both synthetic and real datasets. The experimental results show that our algorithm can perform well on high dimensional dataset and is more efficient than FINDIT, PROCLUS and ORCLUS.

Keywords/Search Tags:

Semi-supervised Learning, Clustering, Subspace Clustering, Inconsistent Constraints, High Dimensional Dataset

Related items

1	Model-based Semi-supervised Subspace Clustering Algorithm Analysis
2	Research On Clustering Ensemble Methods And Their Applications
3	Semi-supervised Clustering With Constraints Assessment
4	The Study Of Semi-supervised Subspace Clustering And Its Applications
5	Research On Semi-supervised Clustering Algorithms With Pairwise Constraints
6	Research On Two Clustering Algorithms Based On Semi-Supervised Learning
7	Semi-supervised Clustering Algorithm And Implementation Based On Seeds Set And Pairwise Constraints
8	Research On Sparse Subspace Clustering Models And Algorithms Based On Low-rank Representation
9	Semi-supervised Clustering Based On Constraints For Images Segmentation
10	Semi-supervised Dimensional Reduction For Discriminative Clustering Analysis