Font Size: a A A

Research On Several Distributed Graph Processing Algorithms And A Unified Graph Programming Framework

Posted on:2022-07-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z K WangFull Text:PDF
GTID:1480306728476704Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Many real-world data(such as social networks,WWW,and bioinformatic networks)can be modeled as graphs.With the rapid development of information technology,the scales of real-world graph data grow rapidly.The performance requirements of large-scale graph analysis are also increasing.Processing big graphs with the distributed computing technology becomes imperative.The related research on distributed graph processing can be divided into two categories: algorithm research and programming framework/system research.In terms of algorithm research,large-scale graph processing faces the difficulties of high computational complexity and long execution time.The thesis proposes efficient distributed algorithms based on the task-parallel paradigm for the vertex similarity calculation problem,the batch subgraph enumeration problem,and the incremental subgraph enumeration problem,respectively.In terms of system research,the existing distributed graph processing systems lack a unified graph programming model and a unified programming framework,making the systems difficult to use.Users are facing high learning costs and program migration costs,impeding the adoption of distributed graph processing systems.To improve the usability of distributed graph processing,the thesis develops a unified graph programming model and framework.The main contributions of the thesis include:(1)An execution-cost-estimation-based threshold-adaptive distributed vertex similarity calculation algorithm.The existing distributed vertex similarity calculation algorithms are only efficient in a narrow range of similarity thresholds.To overcome the drawback,the thesis proposes an execution-cost-estimationbased threshold-adaptive distributed vertex similarity calculation algorithm VSIM.Based on the task-parallel paradigm,VSIM generates an independent vertex similarity calculation task for each vertex.Each task adaptively selects the most efficient execution mode according to an execution cost estimation model.Experimental evaluation indicates that VSIM can run efficiently under a wide range of thresholds,overcoming the drawback of the existing distributed algorithms.VSIM can achieve noticeable performance improvement over the cutting-edge distributed algorithms.(2)A backtracking-based distributed subgraph enumeration algorithm.The cuttingedge distributed subgraph enumeration algorithms solve the problem via distributed multi-way join.However,they have to shuffle intermediate results during the join,bringing high communication costs.To overcome the drawback,the thesis proposes a backtracking-based distributed subgraph enumeration algorithm Batch-BENU.Batch-BENU decomposes the problem into a group of local search tasks that can be executed in parallel.Each task searches isomorphic subgraphs of the pattern graph in a different neighborhood of the data graph,guided by backtracking-based execution plans.Batch-BENU avoids shuffling intermediate results via the on-demand shuffle technique and further reduces the communication cost via the local cache technique.Experimental evaluation indicates that Batch-BENU achieves significant speedups compared with the cutting-edge distributed algorithms CBF and Bi GJoin.It also achieves near-linear node scalability.(3)A matching-mode-conversion-based distributed incremental subgraph enumeration algorithm.The existing distributed incremental subgraph enumeration algorithms have the defects of high memory usage or high communication costs.To overcome the drawbacks,the thesis proposes a matching-modeconversion-based distributed subgraph enumeration algorithm Streaming-BENU.Streaming-BENU converts the incremental subgraph enumeration problem into the batch subgraph enumeration problem based on the concept of incremental pattern graphs.It generates local search tasks for the vertices affected by the update operation at each time step.Each task searches incremental matching results starting from the updated edges,guided by backtrackingbased incremental execution plans.Experimental evaluation indicates that the response time of Streaming-BENU can be much less than the cutting-edge distributed algorithm Delta-Bi GJoin.With big batch sizes and small cache capacity,Streaming-BENU can achieve near-linear node scalability.(4)A unified distributed graph programming model and a programming framework.The existing distributed graph processing systems lack a unified graph programming model and a unified programming framework.It makes the learning cost and the program migration cost high.To make the distributed graph processing systems easier to use,the thesis proposes a unified distributed graph programming model Pregel X.Pregel X shields users from the differences in different graph programming models,hides the technical details of underlying distributed graph processing systems,and decouples the programming interface and the underlying distributed computing frameworks.Based on Pregel X,the thesis further develops a unified distributed graph programming framework Uni GPS.Uni GPS proposes the execution isolation technique based on inter-process communication.The technique makes the existing distributed graph processing systems able to call user-defined functions written in Python.Experimental evaluation indicates that Uni GPS supports complicated graph algorithms(VSIM,Batch-BENU,and Streaming-BENU,etc.)and provides an easy-to-use programming interface.The scale of graph datasets that Uni GPS can handle is much larger than the stand-alone Python graph processing library Network X.For the same dataset,the execution time of Uni GPS is much lower than Network X.
Keywords/Search Tags:graph computing, big data parallel processing, distributed graph algorithm, vertex similarity, subgraph enumeration, distributed graph processing system, programming model
PDF Full Text Request
Related items