Font Size: a A A

Research On Subspace Clustering Algorithms Based On Density

Posted on:2010-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:J J WuFull Text:PDF
GTID:2178360275494445Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Cluster analysis is one of the most important research fields in data mining. It aims at grouping a set of data objects into classes of similar objects, and has wide application prospects. With development of technology, the data in many application fields are always of high dimensions. Many of the dimensions are often irrelevant. These irrelevant dimensions can confuse clustering algorithms by hiding clusters in noisy data. Moreover, when dimensionality increases, data usually become increasingly sparse that is so called curse of dimensionality. When the data become really sparse, data points located at different dimensions can be considered as all equally distanced, and the distance measure, which is essential for cluster analysis, becomes meaningless. Subspace clustering is one of the solutions to this challenge. It searches for groups of clusters within different subspaces of the same dataset. Subspace clustering has many advantages that traditional clustering methods do not possess.This dissertation focuses on density-based subspace clustering algorithms, and it contains some contents as follows:First, basic concepts, primary algorithms of cluster analysis, and techniques for clustering high dimensional data are introduced. Then, conventional subspace clustering methods and relative merits of them are discussed.Many traditional methods enumerate clusters in all subsets of attributes. These methods produce many redundant clusters. This dissertation proposes a subspace clustering algorithm named NRSC that can find non-redundant clusters. It assigns each object to the cluster in the highest dimensional subspace, and reduces clusters automatically. The result clusters can be more easily comprehended.Many density-based subspace clustering methods suffer from huge memory consumption. Another novel algorithm called DMaxC is proposed to alleviate the memory problem. DMaxC partitions feature space via maximum clique, then it searches clusters based on the partitions. The conflict between memory and high dimensions of data is resolved. DMaxC captures the shape and extent of a cluster by references. Search costs are reduced effectively.
Keywords/Search Tags:Cluster analysis, High dimensional data, Subspace clustering
PDF Full Text Request
Related items