Automatic categorical data clustering and spatial data clustering by consecutive resolution refinement

Posted on:2003-06-17

Degree:M.Sc

Type:Thesis

University:University of Alberta (Canada)

Candidate:Foss, Andrew Philip Ogilvie

Full Text:PDF

GTID:2468390011489670

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

Clustering is the problem of grouping data based on similarity and consists of maximizing the inter-group similarity while minimizing the inter-group similarity. The problem of clustering data sets is also known as unsupervised classification, since no class labels are given. However, all existing clustering algorithms require some parameters to steer the clustering process, such as the famous k for the number of expected clusters, which constitutes a supervision of a sort.;This thesis reviews attempts made to date to resolve the problems in clustering and presents two new, efficient, fast and scalable clustering algorithms free from the need for user input parameters. The first, TURN, is well suited to categorical data while TURN* automatically finds interesting resolution levels in spatial data yielding effective and efficient discovery of arbitrarily shaped clusters in the presence of noise. The experiments show that TURN works well without parameter tuning in comparison to another leading algorithm suited to categorical data while TURN* outperforms most existing clustering algorithms in quality and speed for large data sets.

Keywords/Search Tags:

Clustering, Categorical data while TURN*, Spatial data, Inter-group similarity, Data sets

PDF Full Text Request

Related items

1	Studies On Clustering Algorithms For Categorical Data
2	The Research Of Ant-Based Clustering Algorithm For Data Sets With Mixed Attribute
3	A Research On Spatial Data Mining
4	The Research On Clustering Algorithm For Categorical Data Using Quantum Mechanics
5	A Study On Clustering Algorithms For Categorical Data With Applications
6	Categorical Relation Graph Construction And Clustering Analysis For Categorical Data
7	Research On Clustering Based On Attribute Characteristics For Categorical And Binary Data
8	Studies On Clustering Algorithms For Categorical Data
9	The Research On Clustering Algorithm For Categorical Data Based-on Rough Set
10	Research And Application Of Rough Clustering Algorithm For High Dimensional Data Sets