Font Size: a A A

Cluster Based Large-scale Distributed Graph Processing System

Posted on:2018-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:H LongFull Text:PDF
GTID:2428330569475164Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Compared with list table and tree,graph is a more complicated data structure which can encode various information containing heavy relations.We can model a lot of real-world applications as graph,such as airline networks,references to different literatures and the links of individual webpages.Based on these graph data,we can conduct data mining and find deep insights into these data.In the era of big data,the graph is ever increasing.Considering the limited computation and storage resources,it is impossible to handle these large scale graphs in just a single machine.Current distributed graph processing systems suffer high latency,regarding communication,which has a heavy impact on the whole performance.Hence we can develop a more efficient graph processing system by improving the communication overhead.Based on the single machine graph processing system PathGraph,we present a novel distributed graph processing system,DistPathGraph.DistPathGraph takes advantage of three methods to improve the communication latency and convergence speed.First,we propose a new partitioning method based on clusters.The partitioning result can preserve the integrity of paths in graph while maintaining good load balance and low repetition rate.Second,considering the dependency when updating vertices,we develop a new scheduling policy.By arranging the updating order of vertices,we optimize the process of waiting the information from nonlocal vertices.At last but not least,we design a package-based communication method to reduce the communication overhead of the whole system.By conducting thorough experiments,we find that DistPathGraph achieves a good performance in partitioning results and iterative computation.The experiment results show that,regarding the partitioning results,DistPathGraph is better than random hash and PathGraph.The experiments also show that DistPathGraph overperforms the GraphLab,achieving up to 6x(from 0.3x)improvements.
Keywords/Search Tags:Distributed, Graph Computation, Iterative computation, Computing scheduling, Cluster
PDF Full Text Request
Related items