Font Size: a A A

Design And Implementation Of Distributed Graph Processing System And Distributed Query System

Posted on:2018-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:J X ShiFull Text:PDF
GTID:2428330590977766Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Graph structure is becoming more and more popular,social applications,page links,user preferences for goods can be represented as large graphs.Processing and querying such large-scale graph has become a very important problem.With the increasing amount of data,the algorithm becomes more and more complex,the distributed graph processing and graph query become hot research topics.Natural graphs,such as web graphs,email and instant messaging graphs,or social networks,usually exhibit skewed distribution,such as power-law degree distribution.Most of the vertices have relatively few neighbors,while a few vertices have many neighbors,such as social network celebrity.Unfortunately,existing graph-processing systems usually adopt a "one size fits all" design where different vertices are equally processed,leading to suboptimal performance and scalability.More and more public knowledge data is stored as RDF graphs.However,the existing distributed RDF query system cannot meet the requirements.The existing distributed RDF query system often use all of the resources to handle a request,even it's a simple request.Due to the need to synchronize the machines,requests usually have high latency,and due to the use of all the resources,the system cannot handle concurrent requests well,which leads to low throughput.The main contributions of this paper are as follows:We make a comprehensive analysis of existing graph-processing systems over skewed graphs and argue that the skewed vertex distribution demand differentiated computation and partitioning on low-degree and high-degree vertices.Based on our analysis,we introduce PowerLyra.PowerLyra uses random and heuristic hybrid-cut algorithm to partition different types of vertices for a skewed graph,to reduce the replication factor.PowerLyra use hybrid computation engine and locality-conscious graph layout optimization to reduce the communication cost and improve data locality.To solve the problems of the existing distributed graph query system,we design and implement a distributed graph query system Wukong,based on the characteristics of RDMA.Wukong extends the traditional directed graph model by adding index vertices,uses hybrid-cut algorithm to partition normal vertices and index vertices,and uses predicate based store to store RDF data.Wukong replaces the existing one-step pruning method with the full-history pruning,and dynamically chooses execution mode.Since each thread of Wukong can execute a query by itself,the throughput of Wukong is very high.In this paper,the performance of PowerLyra and Wukong are evaluated in detail.The evaluation shows that PowerLyra outperforms PowerGraph by up to 5.5X for real-world graphs and 3.3X for synthetic graphs.Compared to TriAD and Trinity.RDF,the improvement of Wukong is 11.9X for single query,and 740 X for throughput.
Keywords/Search Tags:Distributed graph processing, Graph partition, Distributed graph query, RDMA
PDF Full Text Request
Related items