Font Size: a A A

Approximation algorithms for clustering

Posted on:2006-08-12Degree:Ph.DType:Dissertation
University:Princeton UniversityCandidate:Wirth, Anthony IanFull Text:PDF
GTID:1458390008967810Subject:Computer Science
Abstract/Summary:
Clustering items into groups is a fundamental problem in the information sciences. Many typical clustering optimization problems are NP-hard and so cannot be expected to be solved optimally in a reasonable amount of time. Although the use of heuristics is common, in this dissertation we seek approximation algorithms, whose performance ratio in relation to the optimal solution can be guaranteed and whose running time is a polynomial function of the problem instance size.; We start by examining variants of the asymmetric k-center problem. We demonstrate an O(log* n)-approximation algorithm for the asymmetric weighted k-center problem. Here, the vertices have weights and we are given a total budget for opening centers. In the p-neighbor variant each vertex must have p (unweighted) centers nearby: we give an O(log* k)-bicriteria algorithm using 2k centers, for small p. We also show that the following three versions of the asymmetric k-center problem are inapproximable: priority k-center, k-supplier , and outliers with forbidden centers.; The bulk of the dissertation concerns correlation clustering : clustering a collection of elements based on pairwise judgments of similarity and dissimilarity. The problem instance does not include a distance relation between the elements. We partition the elements into clusters so that the number of pairs correctly (resp. incorrectly) classified with respect to the input judgment labeling is maximized (resp. minimized). It is worthwhile studying both complete instances, in which every pair is labeled, and general instances, in which some input pairs might not have labels.; Specifically, we demonstrate a factor 4 approximation for minimization on complete instances, and a factor O(log n) approximation for general instances. For the maximization version, we give a factor 0.7664 approximation for general instances, noting that a PTAS is unlikely by proving APX-hardness. We also prove the APX-hardness of minimization on complete instances.; We provide the first nontrivial approximation algorithm for maximizing the correlation: the difference between the number of pairs correctly classified and the number incorrectly classified. The factor O(1/log n ) algorithm is derived from an approximation algorithm for maximizing a fairly general type of quadratic program on the unit hyper-cube.
Keywords/Search Tags:Approximation, Algorithm, Clustering, Problem, General
Related items