Isotropy criteria and algorithms for data clustering

Posted on:2012-02-12

Degree:Ph.D

Type:Dissertation

University:University of Illinois at Urbana-Champaign

Candidate:Shetty, Sanketh Vittaldas

Full Text:PDF

GTID:1468390011464714

Subject:Engineering

Abstract/Summary:

Given a set of points, the goal of data clustering is to group them into clusters, such that the internal homogeneity of points within each cluster contrasts to inter-cluster heterogeneity. Over the last fifty years, many methods for data clustering have been developed in diverse scientific communities. However, many of these methods suffer from several shortcomings, and are unable to handle the rich diversity of cluster structures that are usually present in data.;We develop an unsupervised, nonparametric approach to data clustering that addresses these shortcomings. Our goal is to build on the strengths of these methods, while simultaneously offering innovative solutions to their limitations. In our cluster model, clusters are seen as groups of points, with overlapping neighborhoods, that have similar spatial structures that are in contrast with their surroundings. We use the isotropy of a point distribution to characterize spatial structure. We argue that identifying the isotropic density neighborhoods of a point, helps in the detection of a diversity of cluster structures that are challenging to many other methods. We develop three different criteria for identifying neighborhoods with isotropic density. The first criterion is based on examining properties of one-dimensional projections in a hyperspherical neighborhood with uniform point distribution. The second and third criteria are based on the analysis of the force transform, generalized to arbitrary inner product spaces. We use these criteria to develop methods for data clustering.;In our first approach, we define a cluster as a set of contiguous interior points surrounded by border points. We use the isotropy criteria as a threshold to differentiate between interior and border points. We group interior points to form cluster cores, and then identify cluster borders as formed by the border points neighboring the cluster cores. The algorithm is effective in resolving clusters of different shapes, sizes and densities. It is relatively insensitive to outliers.;In our second approach, we adopt the idea of shift vector computation for cluster detection. We present a novel scale-adaptive approach where clusters are detected by moving all points to their cluster cores using shift vectors. We use the isotropic density neighborhoods of a point to determine where and how the shift vectors are computed. We then construct a directed graph induced by these shift vectors. We develop several algorithms for the analysis of this digraph to produce clusters. They range from a simple directed-tree approach to sophisticated spectral analysis of the digraph.;We evaluate these algorithms on challenging real datasets and compare their performance against popular and classic algorithms in data clustering, such as k-means, mixture model clustering, spectral clustering, density-based and shift based methods. We show that our methods outperform these algorithms on many challenging clustering tasks. We also apply our algorithms to image and video segmentation.

Keywords/Search Tags:

Cluster, Algorithms, Points, Criteria, Methods, Isotropy

Related items

1	WiFi Indoor Localization Algorithms Based On Multiple Selections Of Access Points
2	Based On The Application Of Cluster Analysis Of Water Pollution Monitoring System
3	Feature Points Matching Algorithm Research Based On Multiplex Method
4	Clustering Strategy Validity Analysis On Ambulatory Electrocardiogram
5	Design Theory And Application Of The 6-Axis F/T Sensors Based On Stewart Platform Structure
6	Study On The Methods Of Source Number Estimation
7	The Research On Bag-of-Words Based On Improved K-means And Hierarchical Cluster Algorithms
8	Skeleton Matching Based On Clustering
9	Research In Fuzzy Cluster Technology Based On Genetic Algorithms
10	Research Of Feature Points Matching Method Based On BP Neural Network