Font Size: a A A

Automatic categorical data clustering and spatial data clustering by consecutive resolution refinement

Posted on:2003-06-17Degree:M.ScType:Thesis
University:University of Alberta (Canada)Candidate:Foss, Andrew Philip OgilvieFull Text:PDF
GTID:2468390011489670Subject:Computer Science
Abstract/Summary:
Clustering is the problem of grouping data based on similarity and consists of maximizing the inter-group similarity while minimizing the inter-group similarity. The problem of clustering data sets is also known as unsupervised classification, since no class labels are given. However, all existing clustering algorithms require some parameters to steer the clustering process, such as the famous k for the number of expected clusters, which constitutes a supervision of a sort.;This thesis reviews attempts made to date to resolve the problems in clustering and presents two new, efficient, fast and scalable clustering algorithms free from the need for user input parameters. The first, TURN, is well suited to categorical data while TURN* automatically finds interesting resolution levels in spatial data yielding effective and efficient discovery of arbitrarily shaped clusters in the presence of noise. The experiments show that TURN works well without parameter tuning in comparison to another leading algorithm suited to categorical data while TURN* outperforms most existing clustering algorithms in quality and speed for large data sets.
Keywords/Search Tags:Clustering, Categorical data while TURN*, Spatial data, Inter-group similarity, Data sets
Related items