Approximation algorithms for clustering

Posted on:2006-08-12

Degree:Ph.D

Type:Dissertation

University:Princeton University

Candidate:Wirth, Anthony Ian

Full Text:PDF

GTID:1458390008967810

Subject:Computer Science

Abstract/Summary:

Clustering items into groups is a fundamental problem in the information sciences. Many typical clustering optimization problems are NP-hard and so cannot be expected to be solved optimally in a reasonable amount of time. Although the use of heuristics is common, in this dissertation we seek approximation algorithms, whose performance ratio in relation to the optimal solution can be guaranteed and whose running time is a polynomial function of the problem instance size.; We start by examining variants of the asymmetric k-center problem. We demonstrate an O(log* n)-approximation algorithm for the asymmetric weighted k-center problem. Here, the vertices have weights and we are given a total budget for opening centers. In the p-neighbor variant each vertex must have p (unweighted) centers nearby: we give an O(log* k)-bicriteria algorithm using 2k centers, for small p. We also show that the following three versions of the asymmetric k-center problem are inapproximable: priority k-center, k-supplier , and outliers with forbidden centers.; The bulk of the dissertation concerns correlation clustering : clustering a collection of elements based on pairwise judgments of similarity and dissimilarity. The problem instance does not include a distance relation between the elements. We partition the elements into clusters so that the number of pairs correctly (resp. incorrectly) classified with respect to the input judgment labeling is maximized (resp. minimized). It is worthwhile studying both complete instances, in which every pair is labeled, and general instances, in which some input pairs might not have labels.; Specifically, we demonstrate a factor 4 approximation for minimization on complete instances, and a factor O(log n) approximation for general instances. For the maximization version, we give a factor 0.7664 approximation for general instances, noting that a PTAS is unlikely by proving APX-hardness. We also prove the APX-hardness of minimization on complete instances.; We provide the first nontrivial approximation algorithm for maximizing the correlation: the difference between the number of pairs correctly classified and the number incorrectly classified. The factor O(1/log n ) algorithm is derived from an approximation algorithm for maximizing a fairly general type of quadratic program on the unit hyper-cube.

Keywords/Search Tags:

Approximation, Algorithm, Clustering, Problem, General

Related items

1	Research On Algorithm Of Clustering Deformation Problem
2	Research On Approximation Algorithm For Min-Max Cycle Cover Problem On A Mixed Graph
3	Research And Design Of General Accelerator For Various Clustering Algorithms
4	The Systematic Solution To The Privacy-preserving General Geometric Intersection Problem
5	Research On Clustering Algorithm Based On Approximation Set Theory Of Rough Set
6	Study On Clustering Problem Based On Improved Cuckoo Algorithm
7	Approximation Algorithms For Multi-Vehicle Stacker Crane Problems
8	Research On Recommendation Algorithm Based On Clustering And Weighted Matrix Approximation
9	Researches Of Packing Problem With Time Dimension
10	Research And Application Of Approximation Algorithm For Multiple Traveling Salesman Problem