Font Size: a A A

Approximation algorithms for clustering problems

Posted on:2005-04-06Degree:Ph.DType:Thesis
University:Cornell UniversityCandidate:Swamy, ChaitanyaFull Text:PDF
GTID:2450390008994347Subject:Computer Science
Abstract/Summary:
Clustering is a ubiquitous problem that arises in many applications in different fields such as data mining, image processing, machine learning, and bioinformatics. Clustering problems have been extensively studied as optimization problems with various objective functions in the Operations Research and Computer Science literature. We focus on a class of objective functions more commonly referred to as facility location problems. These problems arise in a wide range of applications such as, plant or warehouse location problems, cache placement problems, and network design problems where the costs obey economies of scale.; In the simplest of these problems, the uncapacitated facility location (UFL) problem, we want to open facilities at some subset of a given set of locations and assign each client in a given set D to an open facility so as to minimize the sum of the facility opening costs and client assignment costs. This a very well-studied problem; however it fails to address many of the requirements of real applications. In this thesis we consider various problems that build upon UFL and capture additional issues that arise in applications such as, uncertainties in the data, clients with different service needs, and facilities with interconnectivity requirements. By focusing initially on facility location problems in these new models, we develop new algorithmic techniques that will find application in a wide range of settings.; We consider a widely used paradigm in stochastic programming to model settings where the underlying data, for example, the locations or demands of the clients, may be uncertain: the 2-stage with recourse model that involves making some initial decisions, observing additional information, and then augmenting the initial decisions, if necessary, by taking recourse actions. We present a randomized polynomial time algorithm that solves a large class of 2-stage stochastic linear programs (LPs) to near-optimality with high probability. We exploit this tool to devise the first approximation algorithms for various 2-stage discrete stochastic problems such as the stochastic versions of the set cover, vertex cover, and facility location problems, when the underlying random data is only given as a "black box" and no restrictions are placed on the cost structure.; We introduce the facility location with service installation costs problem to model applications involving clients with different service requirements. if the service requested by it has been installed at the facility (incurring a service installation cost). The connected facility location problem captures settings where the open facilities want to communicate with each other or with a central authority; we model this by requiring that the open facilities be interconnected by a Steiner tree. We give intuitive and efficient algorithms for both these problems. We use these algorithms to obtain approximation algorithms for the kappa-median variants of these problems, where in addition to all of the constraints of the problem, a bound of kappa is imposed on the number of facilities that may be opened.
Keywords/Search Tags:Problem, Approximation algorithms, Facility location, Facilities, Applications, Open, Data
Related items