Lloyd's method (also known as the k-means method) and Iterative Closest Point are two prominent algorithms that use local search to sacrifice accuracy guarantees for speed and simplicity. On the large-scale data sets that arise in practice, this trade-off is appealing, and both algorithms are very widely used.;Previously however, little was known about the worst-case running time of either algorithm, and there was no satisfactory explanation as to why they are fast in practice. In this work, we begin by showing that worst-case complexity cannot offer such an explanation because surprisingly, both algorithms have super-polynomial worst-case running time. To reconcile this shortcoming with the algorithms' observed speed, we turn to the model of smoothed analysis, where running time is measured only after adding a small amount of random noise to the input. We show that in this model, both Lloyd's method and Iterative Closest Point have polynomial complexity. This implies any worst-case scenario requires a pathological placement of points, which is extremely unlikely in practice.;We also observe that Lloyd's method, for all of its advantages, can give arbitrarily bad output even on real-world data sets. We introduce k-means++, which tweaks the initialization procedure for Lloyd's method to get an algorithm with provable accuracy guarantees. Experiments show k-means++ is both faster and more accurate than Lloyd's method in practice, often by considerable margins. We believe this is the first k-means clustering algorithm that combines theoretical guarantees with positive experimental results. |