Font Size: a A A

Distributed Graph-parallel Framework Scheduling Analysis And Optimization

Posted on:2016-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:C N XieFull Text:PDF
GTID:2308330476453481Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Big-Data processing has become a hot topic in distributed computation area. However, transitional Data-Parallel processing framework does not suitable for programs with iterative, convergence-oriented computation and data dependable processing, which are the common properties of Machine Learning and Data Mining(MLDM) algorithm.Hence, the concept of Graph-Parallel processing is putforward and abstract data to be graph-structured units to compute large-scale input data iteratively until get convergence. This trend have led to the development of two different scheduling modes for graph-parallel computation, known as synchronous(Sync) and asynchronous(Async)modes receptively. But there is still no in-depth research about the characteristics of these to mode for programmer to a suitable mode to execute graph computation. And the single scheduling mode could also get a suboptimal performance for some graph algorithm.This paper makes the ?rst comprehensive characterization on the performance of the two modes on a set of typical graph-parallel applications. Our study shows that,under different graph algorithms, partitioning methods, execution stages, input graphs and cluster scales, the performance of the two modes varies signi?cantly, and no single mode consistently outperforms the other. Hence, this paper proposes Hsync, a hybrid graph-parallel scheduling mode, which could adaptively switches between Sync and Async modes for a single graph-parallel program to achieve near-optimal performance.Besides, we build system help Hsync mode to constantly collects execution statistics,combining with heuristics rules to predict and determine when a mode switch could be pro?table. Our system could also support online sampling and o?ine pro?ling approaches combined with a set of heuristics to accurately predicting future performance in the two modes.A prototype called PowerSwitch has been built based on PowerGraph, a stateof-the- art distributed graph-parallel system, to support adaptive execution of graph algorithms. On a 48-node EC2-like cluster, PowerSwitch consistently outperforms the best of both modes, with a speedup ranging from 9% to 73% due to timely switch between two modes.
Keywords/Search Tags:Distributed Graph Computation, Graph-Parallel, Synchronous Mode, Asynchronous Mode, Distributed System
PDF Full Text Request
Related items