In recent years,with the explosive growth in electronic information and data,distribut-ed graph-processing system has become a key technique that efficiently supports large-scale graph data processing,thus gaining increasing attention.Efficiency problem has become an important concern from the users of distributed graph-processing systems and people from the whole society,that is,the graph-computing jobs are always expected to be finished in a shorter time by using a smaller cluster,thus reducing the computation cost and the energy consumption.The requirement of high efficiency is different for two types of users accord-ing to their requirements of performance and the hardware resources possessed by them.For high-end users,they have a large number of compute nodes and possess more abundant funds,and their graph-computing jobs are usually more time-sensitive.Hence,they expect to improve the efficiency of distributed graph-processing systems by reducing the runtime,and thus obtaining the high utilization of hardware resources.For small and medium-end users,they usually expect to improve the efficiency of distributed graph-processing systems by reducing the number of compute nodes,based on the premise of the acceptable system performance.Since the hardware cost is of the most concern to them.However,existing distributed graph-processing systems fail to efficiently utilize the resources of computation,memory and network,or the combination of them,leading the low cost effectiveness,thus the low energy efficiency.The reasons are the inefficiencies of the scheduling strategy,com-putation model,data representation,programming model and communication mechanism.Due to the fact that the graph algorithms access graph data irregularly,most distributed graph-processing systems require that the graphs fit entirely in memory,during the graph processing process.However,there are three problems in in-memory distributed graph-processing systems.First,it necessitates a very large cluster to process very large graphs,such as graphs with hundreds of billions of edges.The excessive investment of a very large cluster discourages and possibly prevents many small and medium-end users from de-ploying their large-scale graph-computing jobs.Second,due to the high communication costs,these systems routinely suffer from limited performance and scalability.Third,dur-ing the whole computation process that follows the preprocessing phase,the resource of external storage is idle.In order to address the problem of low cost effectiveness in existing in-memory distributed graph-processing systems,we propose a pipeline based scheduling strategy that can schedule the graph-computing job between the memory and external stor-age.By scheduling the large-scale graph-computing job on the small cluster intelligently,this technique can hide the latencies of disk I/O and communication by overlapping the disk I/O time and the communication time of each compute node with the computation times of other compute nodes,almost reducing the overall runtime to be equal to the time spent on the computations only,thus achieving the high cost effectiveness.Based on this scheduling strategy,we implement a highly cost-effective distributed graph-processing system,called DD-Graph.Extensive evaluation indicates that,compared with GPS and Giraph,DD-Graph saves 40%N75%of hardware costs,while obtaining~10%performance improvement.Due to the inefficiencies of the vertex-centric subgraph construction method and the computation model that combines the processes of computation and communication,Bulk Synchronous Parallel(BSP)computation model based distributed graph-processing systems suffer from high communication costs,leading to limited performance,and thus reducing the energy efficiency.In order to address this problem,we propose a computation model,called LCC-BSP,that decomposes each superstep into two distinct steps of computation and communication,and combines the proposed edge-data block based subgraph construction method.The reason is based on the fact that,for graph-computing jobs,the time of com-putation step is short,and the communication step can be finished in instantaneously and simultaneously in a well-orchestrated concurrent manner.The high efficiency of communi-cation step stems from the proposed edge-data block based subgraph construction method that avoids the four factors that cause the high communication costs in existing distributed graph-processing systems.First,the bulk of the extra communication volume is attributed to the need to carry the destination vertex name on each message.Second,there are two rounds of data copying,one at the sender side and the other at the receiver side.Third,the parsing overhead is used by the receiver side to parse the received message batches.Finally,the inefficient communication techniques that lengthen the communication time.Based on this computation model,we implement a high-performance distributed graph-processing sys-tem,called LCC-Graph.Extensive evaluation indicates that LCC-Graph obtains an order of magnitude performance improvements over existing distributed graph-processing systems.Nowadays,high-bandwidth networks are easily accessible.However,due to the inef-ficiencies of the vertex-centric programming model and the vertex-target communication scheme,existing distributed graph-processing systems generate,send and receive messages so slowly that only a small fraction of the available network bandwidth is utilized,leading to very long waiting times experienced by users for the results of graph-computing.Moreover,during the execution process of the graph-computing job,large memory space is required to buffer the intermediate messages between any two consecutive supersteps,leading to the low utilization of memory space.In order to address this problem,we propose a new method for fast message generation and exchange between vertices,with high memory utilization and high network-bandwidth utilization.Our approach aims at significant reduction in(ⅰ)the computation workload of each vertex for fast message generation by using a new slimmed-down vertex-centric programming model and(ⅱ)the average message overhead for fast message delivery by designing a light-weight message-centric communication scheme.Moreover,the proposed communication scheme needs not buffer the intermediate messages between any two consecutive supersteps,significantly improves the memory space utilization.Given a cluster,our approach can process larger-scale graph-computing jobs.Based on this method,we implement a high-performance and high memory utilization distributed graph-processing system,called BlitzG.Extensive evaluation shows that BlitzG outperforms the existing distributed graph-processing systems by an average of 20.7x,while saving~78%requirement of memory space. |