Analyzing and improving local search: K-means and ICP

Posted on:2010-03-10

Degree:Ph.D

Type:Dissertation

University:Stanford University

Candidate:Arthur, David

Full Text:PDF

GTID:1448390002471855

Subject:Computer Science

Abstract/Summary:

Lloyd's method (also known as the k-means method) and Iterative Closest Point are two prominent algorithms that use local search to sacrifice accuracy guarantees for speed and simplicity. On the large-scale data sets that arise in practice, this trade-off is appealing, and both algorithms are very widely used.;Previously however, little was known about the worst-case running time of either algorithm, and there was no satisfactory explanation as to why they are fast in practice. In this work, we begin by showing that worst-case complexity cannot offer such an explanation because surprisingly, both algorithms have super-polynomial worst-case running time. To reconcile this shortcoming with the algorithms' observed speed, we turn to the model of smoothed analysis, where running time is measured only after adding a small amount of random noise to the input. We show that in this model, both Lloyd's method and Iterative Closest Point have polynomial complexity. This implies any worst-case scenario requires a pathological placement of points, which is extremely unlikely in practice.;We also observe that Lloyd's method, for all of its advantages, can give arbitrarily bad output even on real-world data sets. We introduce k-means++, which tweaks the initialization procedure for Lloyd's method to get an algorithm with provable accuracy guarantees. Experiments show k-means++ is both faster and more accurate than Lloyd's method in practice, often by considerable margins. We believe this is the first k-means clustering algorithm that combines theoretical guarantees with positive experimental results.

Keywords/Search Tags:

K-means, Lloyd's method, Algorithm

Related items

1	Fabrication And Test Of Sub-micron Grating Based On Lloyd Mirror Interference System
2	Mosaic Image Generation Method Based On Geodesic Voronoi Diagram
3	Research On Local Region Level Set Method Based On K-means++ Algorithm
4	Research Of The Adaptive Optimization Method About K Value Of K-Means Algorithm
5	Text Clustering Based On K-means Algorithm And Realization
6	Improvement And Application Of K-means Algorithm
7	Research On K-means Method Based On Cuckoo Algorithm
8	Research On Some Fuzzy C Means Clustering Algorithm And Its Application In IDS
9	The Image Restoration Algorithm Based On Nonlocal Means Regularization Method
10	The Research On Parallel Computing Technology In Precise Agricultural Climate Division