Font Size: a A A

Isotropy criteria and algorithms for data clustering

Posted on:2012-02-12Degree:Ph.DType:Dissertation
University:University of Illinois at Urbana-ChampaignCandidate:Shetty, Sanketh VittaldasFull Text:PDF
GTID:1468390011464714Subject:Engineering
Abstract/Summary:
Given a set of points, the goal of data clustering is to group them into clusters, such that the internal homogeneity of points within each cluster contrasts to inter-cluster heterogeneity. Over the last fifty years, many methods for data clustering have been developed in diverse scientific communities. However, many of these methods suffer from several shortcomings, and are unable to handle the rich diversity of cluster structures that are usually present in data.;We develop an unsupervised, nonparametric approach to data clustering that addresses these shortcomings. Our goal is to build on the strengths of these methods, while simultaneously offering innovative solutions to their limitations. In our cluster model, clusters are seen as groups of points, with overlapping neighborhoods, that have similar spatial structures that are in contrast with their surroundings. We use the isotropy of a point distribution to characterize spatial structure. We argue that identifying the isotropic density neighborhoods of a point, helps in the detection of a diversity of cluster structures that are challenging to many other methods. We develop three different criteria for identifying neighborhoods with isotropic density. The first criterion is based on examining properties of one-dimensional projections in a hyperspherical neighborhood with uniform point distribution. The second and third criteria are based on the analysis of the force transform, generalized to arbitrary inner product spaces. We use these criteria to develop methods for data clustering.;In our first approach, we define a cluster as a set of contiguous interior points surrounded by border points. We use the isotropy criteria as a threshold to differentiate between interior and border points. We group interior points to form cluster cores, and then identify cluster borders as formed by the border points neighboring the cluster cores. The algorithm is effective in resolving clusters of different shapes, sizes and densities. It is relatively insensitive to outliers.;In our second approach, we adopt the idea of shift vector computation for cluster detection. We present a novel scale-adaptive approach where clusters are detected by moving all points to their cluster cores using shift vectors. We use the isotropic density neighborhoods of a point to determine where and how the shift vectors are computed. We then construct a directed graph induced by these shift vectors. We develop several algorithms for the analysis of this digraph to produce clusters. They range from a simple directed-tree approach to sophisticated spectral analysis of the digraph.;We evaluate these algorithms on challenging real datasets and compare their performance against popular and classic algorithms in data clustering, such as k-means, mixture model clustering, spectral clustering, density-based and shift based methods. We show that our methods outperform these algorithms on many challenging clustering tasks. We also apply our algorithms to image and video segmentation.
Keywords/Search Tags:Cluster, Algorithms, Points, Criteria, Methods, Isotropy
Related items