Font Size: a A A

Design And Implementation Of Distributed Graph Computing Engine

Posted on:2021-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y P HuFull Text:PDF
GTID:2428330623468540Subject:Engineering
Abstract/Summary:PDF Full Text Request
The application of Internet technology has led to the explosive growth of the real-life data.However,the analysis and processing of massive data have always been the focus when the industry deals with massive data.Graph model,which is the abstraction of real-life objects and connections,is able to express the attributes and dependencies of data more powerful when compared to the traditional relation model.Distributed data processing systems,such as Spark,MapReduce,mainly focus on traditional data analysis and these systems take batch processing as the principal thing so that are not able to construct an iterative computation based on graph data.The processing system based on graph data focuses on the advantages of graph model,and calculates graph vertices data iteratively in the mode of GAS(Gather-Apply-Scatter),so as to realize distributed real-time analysis and processing of large-scale graph data.In this paper,we design and implement a distributed graph computing engine based on graph data.Based on the typical graph computing systems such as Pregel and PowerGraph,this paper focuses on the analysis of the impact of data organization and iteration mode on the system performance.The main work of graph computing engine is as follows:(1)Design and implement the memory meta data and data management structure of the distributed graph computing engine:Based on the existing graph partition algorithm,design and implement the balance k-way vertex cutting strategy based on the greedy strategy with the edge as the loading granularity,load the graph data from the graph storage engine to the graph computing engine's memory in a distributed way;design and implement the memory based vertex primary and secondary storage structure,and propose an interface for quick access and modification of vertex data,including its mirror data,adjacent vertex data and edge data in the iterative calculation process.In the calculation process,the primary and secondary storage of metadata is carried out to avoid the problem that data cannot be accessed due to single point failure.(2)Design and implement the synchronous iterative engine of the distributed graph computing engine:design the synchronous iterative graph computing model,decompose the execution entity with super step and sub step,bind the execution entity with computer resources,and formulate scheduling strategies to optimize the iteration efficiency.Combined with the characteristics of GAS iteration mode in PowerGraph and BSP iteration mode in Pregel,a calculation model based on super step iteration and synchronization is designed.The calculation model is implemented by multithreading,and the user-defined GAS program uses the shared library as the plug-in to load,decoupling the business logic and the calculation engine logic.An incremental cache model is designed and implemented,which is transmitted in advance The incremental method reduces the network transmission overhead in the step over iteration.(3)The test results show that the system can meet the needs of load balancing and computing proximity in graph partition algorithm,and the iterative model based on GAS can successfully improve the efficiency of graph data processing.
Keywords/Search Tags:Graph Data Processing, Real-time Computing, GAS Iteration, Distributed Computing
PDF Full Text Request
Related items