Research On Subspace Clustering Algorithms Based On Density

Posted on:2010-11-30

Degree:Master

Type:Thesis

Country:China

Candidate:J J Wu

Full Text:PDF

GTID:2178360275494445

Subject:Computer software and theory

Abstract/Summary:

Cluster analysis is one of the most important research fields in data mining. It aims at grouping a set of data objects into classes of similar objects, and has wide application prospects. With development of technology, the data in many application fields are always of high dimensions. Many of the dimensions are often irrelevant. These irrelevant dimensions can confuse clustering algorithms by hiding clusters in noisy data. Moreover, when dimensionality increases, data usually become increasingly sparse that is so called curse of dimensionality. When the data become really sparse, data points located at different dimensions can be considered as all equally distanced, and the distance measure, which is essential for cluster analysis, becomes meaningless. Subspace clustering is one of the solutions to this challenge. It searches for groups of clusters within different subspaces of the same dataset. Subspace clustering has many advantages that traditional clustering methods do not possess.This dissertation focuses on density-based subspace clustering algorithms, and it contains some contents as follows:First, basic concepts, primary algorithms of cluster analysis, and techniques for clustering high dimensional data are introduced. Then, conventional subspace clustering methods and relative merits of them are discussed.Many traditional methods enumerate clusters in all subsets of attributes. These methods produce many redundant clusters. This dissertation proposes a subspace clustering algorithm named NRSC that can find non-redundant clusters. It assigns each object to the cluster in the highest dimensional subspace, and reduces clusters automatically. The result clusters can be more easily comprehended.Many density-based subspace clustering methods suffer from huge memory consumption. Another novel algorithm called DMaxC is proposed to alleviate the memory problem. DMaxC partitions feature space via maximum clique, then it searches clusters based on the partitions. The conflict between memory and high dimensions of data is resolved. DMaxC captures the shape and extent of a cluster by references. Search costs are reduced effectively.

Keywords/Search Tags:

Cluster analysis, High dimensional data, Subspace clustering

Related items

1	Research On Subspace Clustering Algorithm For High Dimensional Data
2	Research On Subspace Clustering Algorithms Based On Density
3	Research On Clustering Algorithms For High-Dimensional Data
4	Research On Projective Clustering Algorithms With Applications For High-dimensional Data
5	Research On Subspace Clustering Algorithms For High-dimensional Data
6	Research On Clustering Methods For High Dimensional Data And Their Applications
7	A New High-dimensional Data Clustering Algorithm Based On GAs
8	Research And Application Of Soft Subspace Clustering Algorithms
9	Research On Improved Subspace Clustering Algorithm
10	The Research On Common Subspace Recognition Method For High Dimensional Data