Research On Improved Subspace Clustering Algorithm

Posted on:2009-07-22

Degree:Master

Type:Thesis

Country:China

Candidate:Y Yan

Full Text:PDF

GTID:2178360272470275

Subject:Software engineering

Abstract/Summary:

Clustering analysis with a wide range of applications is very important for data mining. A large number of different algorithms were proposed by the researchers according to the different applications such as partition methods, hierarchical methods, grid-based methods, density-based methods and so on. At present, clustering algorithms focus on the large scales and dimensional data set. The traditional clustering algorithms are not effective to cluster the sparse data in high dimensional data environment. Subspace clustering algorithm, which aim at solving the clustering problem in high dimensional data environment, is a new and important embranchment of clustering analysis. High dimensional clustering plays a vital role in clustering algorithms which can be applied broadly. Subspace clustering which extend the traditional clustering algorithms is an effective way to cluster the high dimensional data. It localizes the search for relevant dimensions. The representative algorithms are presented, such as CLIQUE and SUBCLU. There are different densities in subspace with different dimensions in real data sets. The subspace clustering algorithms above are not effective on handling practical data sets for the reason of high dimensional environment and the single type of data.A novel subspace clustering algorithm based on k-most similar cluster called KSCSC is presented in this paper to cluster the high dimensional data. KSCSC finds the k-most similar cluster by the similarity of the clusters, guarantees the subspace search direction by the k-most similar cluster, discovers the different subspace through the different local density threshold and clusters both continuous data and categorical data. It uses the local density threshold according the practical distribution to enhance the scalability and the accuracy. The local density threshold avoids the shortcomings of the other algorithms which lie on the given parameters.Several experiments based on the different data sets are performed and the results suggest that KSCSC can cluster both continuous data and categorical data in high dimensional data environment and is more efficient than CLIQUE, SUBCLU and ROCK.

Keywords/Search Tags:

Clustering analysis, Subspace clustering, High dimensional data, Data mining

Related items

1	Research On Subspace Clustering Algorithm For High Dimensional Data
2	Research On Clustering Algorithem For High Dimensional Data
3	The Research On Subspace Clustering For High Dimensional Data
4	Study On High-dimensional Data Subspace Clustering Analysis And Application
5	Research On Subspace Clustering Algorithms For High-dimensional Data
6	Research On Key Technologies Of Clustering High-dimensional Data Based On Sparse Subspace And Their Applications
7	Research On Clustering Algorithms For High-Dimensional Data
8	Improvement Research Of Clustering Algorithm Based On High-dimensional Data
9	Research On Clustering Algorithms For High-Dimensional Data
10	Research On Some Algorithms For High-Dimensional Data Clustering