Font Size: a A A

Research On Improved Subspace Clustering Algorithm

Posted on:2009-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y YanFull Text:PDF
GTID:2178360272470275Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Clustering analysis with a wide range of applications is very important for data mining. A large number of different algorithms were proposed by the researchers according to the different applications such as partition methods, hierarchical methods, grid-based methods, density-based methods and so on. At present, clustering algorithms focus on the large scales and dimensional data set. The traditional clustering algorithms are not effective to cluster the sparse data in high dimensional data environment. Subspace clustering algorithm, which aim at solving the clustering problem in high dimensional data environment, is a new and important embranchment of clustering analysis. High dimensional clustering plays a vital role in clustering algorithms which can be applied broadly. Subspace clustering which extend the traditional clustering algorithms is an effective way to cluster the high dimensional data. It localizes the search for relevant dimensions. The representative algorithms are presented, such as CLIQUE and SUBCLU. There are different densities in subspace with different dimensions in real data sets. The subspace clustering algorithms above are not effective on handling practical data sets for the reason of high dimensional environment and the single type of data.A novel subspace clustering algorithm based on k-most similar cluster called KSCSC is presented in this paper to cluster the high dimensional data. KSCSC finds the k-most similar cluster by the similarity of the clusters, guarantees the subspace search direction by the k-most similar cluster, discovers the different subspace through the different local density threshold and clusters both continuous data and categorical data. It uses the local density threshold according the practical distribution to enhance the scalability and the accuracy. The local density threshold avoids the shortcomings of the other algorithms which lie on the given parameters.Several experiments based on the different data sets are performed and the results suggest that KSCSC can cluster both continuous data and categorical data in high dimensional data environment and is more efficient than CLIQUE, SUBCLU and ROCK.
Keywords/Search Tags:Clustering analysis, Subspace clustering, High dimensional data, Data mining
PDF Full Text Request
Related items