Font Size: a A A

Graph and geometric algorithms on distributed networks and databases

Posted on:2012-02-27Degree:Ph.DType:Thesis
University:Georgia Institute of TechnologyCandidate:Nanongkai, DanuponFull Text:PDF
GTID:2468390011467798Subject:Applied Mathematics
Abstract/Summary:
This thesis studies the powers and limits of graph and geometric algorithms when we face with massive data, sometimes distributed on several machines. In this case, efficient algorithms are often required to use sublinear resources, such as time, memory, and communication. Three specific models of our interest are distributed networks, data streams, and communication complexity. In these models, we study lower and upper bounds of many problems motivated by applications in networking and database areas, as follows.;How fast are approximation algorithms on distributed networks? We show lower bounds of many approximation graph algorithms by studying the verification problem, stated as follows. Let H be a subgraph of a network G where each vertex of G knows which edges incident on it are in H. We would like to verify whether H has some properties, e.g., if it is a tree or if it is connected (every node knows in the end of the process whether H has the specified property or not). We would like to perform this verification in a decentralized fashion via a distributed algorithm. The time complexity of verification is measured as the number of rounds of distributed communication.;In this thesis we initiate a systematic study of distributed verification, and give almost tight lower bounds on the running time of distributed verification algorithms for many fundamental problems such as connectivity, spanning connected subgraph, and s -- t cut verification.;We then use this these results to derive strong unconditional time lower bounds on the hardness of distributed approximation for many classical optimization problems including minimum spanning tree, shortest paths, and minimum cut. Many of these results are the first non-trivial lower bounds for both exact and approximate distributed computation and they resolve previous open questions. Moreover, our unconditional lower bound of approximating minimum spanning tree (MST) subsumes and improves upon the previous hardness of approximation bound of Elkin [STOC 2004] as well as the lower bound for (exact) MST computation of Peleg and Rubinovich [FOCS 1999]. Our result implies that there can be no distributed approximation algorithm for MST that is significantly faster than the current exact algorithm, for any approximation factor.;In this thesis, we study the skyline problem on the multi-pass streaming model. This model is the most restricted (least powerful) among many models that differentiates between sequential and random accesses as algorithms on this model are allowed to do a random access only once after each round of sequential accesses to the whole input.;We show that, even in this very restricted model, one can still get an efficient algorithm. Our algorithm uses space only enough to store the skyline and uses linear sequential and logarithmic random accesses (or passes in the terminology of data streams). To the best of our knowledge, it is the first randomized skyline algorithm in the literature. We also prove that this our algorithm is near-optimal.;Additionally, we show that the same idea can be used to develop a distributed algorithm that is near optimal. We also show that the algorithm can handle partially ordered domains on each attribute. Finally, we demonstrate the robustness of this algorithm via extensive experiments on both real and synthetic datasets. Our algorithm is comparable to the existing algorithms in average case and additionally tolerant to simple modifications of the data, while other algorithms degrade considerably with such variations. (Abstract shortened by UMI.)...
Keywords/Search Tags:Algorithm, Distributed, Data, Graph, Lower bounds, Time
Related items