Font Size: a A A

Research On Distributed Graph Computation With Topology Refactorization

Posted on:2019-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:J Y WangFull Text:PDF
GTID:2428330623450693Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer science and technology,graph theory knowledge and algorithms involved in web search,social network,bioinformatics and other fields have been widely used and developed.The scale of naturally generated graph data shows an explosive growth,and the perfecting and improving of distributed computing technology make the field of distributed graph computing become the hot research direction of academia and industry.However,it is one of the research hotspots to improve the computational performance of distributed graph that the reasonable division and storage of large-scale graph data can reduce the communication among the machine nodes.In addition,the distributed graph computing system in the realization of distributed computing model,often there will be redundant computing or redundant communication problems.Therefore,the optimization and implementation of distributed computing model is also a key issue in the performance optimization of distributed graph computing system.TopoX is an efficient distributed graph computing system based on PowerGraph distributed graph computing system.In this paper,by studying TopoX distributed graph data partitioning and distributed graph computing model,this paper discusses the distributed graph computing system performance optimization technology.The research focuses on the distributed graph segmentation algorithm based on topological reconstruction and GAS distributed computing model based on incremental change.The main contents of this paper include:First,we deeply study and analyze the existing graph data partitioning algorithms in distributed graphs,and reveal their respective limitations when analyzing the advantages of their research results.At the same time,the existing distributed graph calculation model and its implementation in the actual distributed graph calculation system are deeply analyzed.The advantages and disadvantages of the existing distributed graph calculation system are pointed out.Aiming at the inefficient partitioning of large-scale graph computation,a distributed graph segmentation algorithm based on topological reconstruction is proposed.By analyzing the naturally generated graph data,we found that the vertex degree of the natural graph is unevenly distributed,which often results in inefficient partitioning of large-scale graph computation.As a result,uneven computing load occurs on all the machine nodes and causes excessive inter-node communication cost.The proposed method of effectively utilizing the graph topology information can be used to convert the unbalance graph(power law distribution graph)to a more balanced topology before partitioning.Specifically,we perform a fusion operation on a set of(adjacent)low vertices that merge into a hyper vertex and perform a fission operation on the segmented height vertices into a set of sub-vertices so that the calculations involved in all the vertices are divided More uniform.The result of refactoring can be further partitioned by all the compute nodes by the even assignment of hyper vertices and sub vertices to further ensure the compute load balancing of each node.Thirdly,aiming at the problem of redundancy calculation and redundant communication which generated by the existing distributed graph calculation model GAS in the actual implementation,a GAS distributed calculation model based on incremental change is proposed.By analyzing the standard GAS model and its distributed computing process implemented on PowerGraph,it is found that the standard GAS model has redundant overhead in computation and communication during distributed computing.We use each iteration of each vertex to calculate the phase compared with the incremental data generated by the previous iteration,it can be used as transmission data to replace the vertex update data and activation information,which can effectively reduce the calculation process and redundant communication overhead.At the same time,for multidimensional data,the GAS model is optimized on the premise of the graph segmentation algorithm using topology reconstruction and multidimensional segmentation integration to provide inter-layer communication and ensure the interaction and aggregation of data between layers and effectively improve the computing efficiency of the distributed graph computing system.This paper explores the optimization of large-scale graph data partitioning algorithm and distributed graph computing model.The research results have certain theoretical value and significance to the performance improvement of distributed graph computing system.Finally,this paper verifies the performance advantages of TopoX through a large number of experiments.
Keywords/Search Tags:Distributed Graph Computation, Graph partitioning, Topological Reconstruction, GAS model
PDF Full Text Request
Related items