Font Size: a A A

Storage Access Optimization In Distributed Graph Processing System Based On Consistent Hashing

Posted on:2018-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:J H LiFull Text:PDF
GTID:2348330569975114Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Graph processing system as a type of big data analysis tool has been used in many fields,with the growing scale of nowadays graphs,we need better scalability and parallelism of the graph processing systems,on the one hand,we can use a distributed system to scale out graph processing,on the other hand,we can optimize the memory access to scale up graph processing.Both of the two ways face graph partitioning,and optimal graph partitioning is a classical NP problem,so in most graph processing systems,graphs are partitioned randomly and casually,but this kind of graph partitioning way usually leads to imbalance workload while the graph is processed,and it drags the graph processing speed.In distributed graph processing system,graph data storage is a big problem,the system has to locate an update message,we should store graph data carefully so as to address its destination rapidly.This paper innovatively uses consistent hashing as meta-data to store graph data,and this method has two advantages.The first is rapidly addressing,we use red-black tree to locate the destinations,this tree has the same number of nodes with graph partitions,and usually the graph partitions is about dozens or less,so in the red-black tree,search times would be less than ten times.The second is dynamically graph partitioning,in our system,the heavy load graph partition can be repartitioned into two partitions during graph processing,we evaluate the partition load using mathematical method,we make the two new partitions share the similar load on purpose,and the heavy load partition would no longer exists,so the graph processing will be accelerated.We design and implement our distributed graph processing system named Ecgraph,Ecgraph uses our new graph storage strategy,and owns above two advantages.We test our Ecgraph using different datasets and algorithms,the test results show that one iteration runtime reduces by 30%~50% after three times storage optimization.And we also test our Ecgraph with Chaos,it shows that the Ecgraph has better performance than Chaos,and as for skewed graph such as R-MAT,Ecgraph reduces about 40%~61% time.And from the tests,we can learn when the cluster scale becomes larger,the communication cost will be stable with more machines in the cluster.
Keywords/Search Tags:consistent hashing, dynamic partitioning, graph processing, graph partitioning, storage access optimization
PDF Full Text Request
Related items