Font Size: a A A

Computing summaries over distributed data

Posted on:2014-09-02Degree:Ph.DType:Thesis
University:Hong Kong University of Science and Technology (Hong Kong)Candidate:Huang, ZengfengFull Text:PDF
GTID:2458390005986987Subject:Computer Science
Abstract/Summary:
Consider a distributed system with k nodes, where each node holds a part of the data. The goal is to extract some useful information from the entire data set or to compute some functions over the data. We are interested in designing communication-efficient algorithms and also characterizing the communication complexity for various problems. We consider both a at network structure and more complicated tree networks.;In this thesis, we study some most important statistical summaries of the underlying data, in particular item frequencies, heavy hitters, quantiles, and epsilon-approximations, which are extensively studied in database, machine learning, computational geometry and data mining. We provide general techniques for both designing efficient algorithms and proving communication lower bounds, with which we get almost tight bounds for these problems.;We also study graph problems in the distributed setting, where the edges of the input graph is partitioned across k nodes. We show how to compute an approximate maximum matching, one of most important graph problems, communication-efficiently and prove a tight lower bound for this problem. To prove this lower bound, we develop new techniques, which we believe will have a wide applicability to prove distributed communication complexity for other graph problems.
Keywords/Search Tags:Distributed, Data, Graph problems
Related items