Font Size: a A A

A cohesion-based clustering technique for categorical data

Posted on:2007-12-10Degree:M.Comp.ScType:Thesis
University:Concordia University (Canada)Candidate:Nemalhabib, AidaFull Text:PDF
GTID:2448390005972307Subject:Computer Science
Abstract/Summary:
Clustering is a technique which aims to partition a given dataset of objects into groups of similar objects. In this work, we consider categorical data, which are unordered unlike numerical data. This makes clustering such data a more challenging task. We propose a clustering technique for categorical data, which uses a novel similarity function, called cohesion, to measure the degree to which objects "stick" to clusters. We have implemented this technique, to which we refer as CLUC (CLUstering with Cohesion). To evaluate CLUC, we compared its results with those produced by well-known clustering algorithms. The results of our extensive experiments on real and synthetic datasets show that CLUC generates high quality clusters which conform better to clusterings by human experts. For some well-known real datasets, CLUC even discovers clusterings identical to those provided by experts. Our results also indicate that CLUC is order insensitive in general and is scalable when the dataset grows in size (the number of objects) and/or dimensions (attributes).
Keywords/Search Tags:Data, Clustering, Technique, Objects, CLUC, Categorical
Related items