Font Size: a A A

Key Technology Research On Processing And Analyzing Big Graphs

Posted on:2018-09-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:B SuoFull Text:PDF
GTID:1368330563495796Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Graph data,as an important data type widely used for modeling relationships be-tween entities,are common in computational biology,web,social networks,and etc..As the era of big data is coming,inspired by related techniques,the amount of graph data start to increase explosively.Hence,for the large scale of graph data,how to process them efficiently,to support common operations,and to carry out effective analysis,have become emergency problems and also bring new challenges.Recently,as various programming frameworks and their open-source implementa-tions come out,such as MapReduce and BSP,distributed and parallel frameworks for graph data processing has become the first choice to solve big data related problems,due to their advantages in scalability,tolerance,and usability.Based on the open-source implementations of these frameworks and platforms,researchers,according to the nature of graph data and their applications,have proposed many methods for big graph data processing and analysis.But there still exist following problems and challenges:(1)For graph data processing and analysis frameworks,communication and synchronization bot-tlenecks appear in synchronous graph programming models,how to design and implement a new general framework to solve these bottlenecks.(2)For graph processing solutions,how to achieve iteration optimization through parallel algorithms design.(3)For graph processing and analysis models,confronted with various synchronous and asynchronous frameworks,and algorithms tightly incorporated with their frameworks,how to design a unified programming model to guarantee the correctness of these algorithms when run-ning on different frameworks.In this work,we focus on iterative computation problems in graph processing and analysis.From the perspective of framework optimization,we propose a hybrid processing framework to facilitate iterative computations.For parallel solutions,we propose a parallel subgraph matching solution with iteration optimization concerns.With regard to programming models,we propose a modelling approach based on a unified programming model,and with the help of this model,we design and imple-ment a parallel solution for the community detection problem.Our work was supported in part by the Natural Science Foundation of China and the Graduate Starting Seed Fund of Northwestern Polytechnical University.The main contributions of this thesis can be summarized as follows:(1)For graph processing and analysis frameworks,in order to solve the the it-erative computation bottlenecks in synchronous programming models,we analyse the limitations of standard BSP under different scenarios.In order to reduce the amount of global iterations in graph processing,we propose an optimization approach based on hy-brid processing.By introducing pseudo-supersteps within graph partitions,the proposed approach deals with computations within and between graph partitions differently to cut down the amount of communications and synchronizations and accelerate convergence.Keeping the vertex-centric interfaces,we design and implement GraphHP,a new graph processing framework based on an open-source BSP implementation,and preserve the usability of BSP programming model.The experiments evaluate our improvements in it-eration number,communication cost,and processing speed,and show better performance than existing asynchronous platforms.(2)For parallel graph processing and analysis solutions,focusing on iterative com-putation cost,we propose a parallel subgraph matching solution with optimized iterative process.By abstracting parallel subgraph matching in synchronous programming models as an iterative computation process backboned by join operations,we separate each iter-ation as two successive phases:computation phase,and communication phase.We start our cost analysis with more attention on the iterative process,and propose an iteration optimization approach.Considering both of the amount of iterations and communication cost,the proposed approach carries out from query decomposition and join operation.In query decomposition,we optimize the number of iterations and the amount of partial matches by controlling the size of the decomposed subgraph set and using the discrimi-nations between different vertex labels,respectively.In join operation,we use bushy-tree to construct the join plan and optimize it from three perspectives,which are the join tree height,local join,and join estimation.Also our solution can make use of existing re-searches on subgraph matching for a single machine.Experiments on the implementation of our solution on MapReduce evaluate the performance gain provided by our approach,and show that our solution is better on processing and response speed both than the ones without iteration optimizations.(3)For graph processing and analysis programming models,aiming at the compatibility problem between computing frameworks and parallel solutions,after carefully analysing the differences between various implementations of both synchronous and asyn-chronous programming models,and the tight coherence between algorithm optimizations and corresponding frameworks,we propose a modelling approach based on a unified pro-gramming model,DFA-G,for vertex-centric graph processing platforms.The unified programming model describes graph processing and analysis algorithms as a vertex state transition process driven by messages.It decouples algorithm correctness from message arriving orders and guarantees any graph algorithms designed by it can run correctly on frameworks with asynchronous optimizations.We design and implement a prototype based on DFA-G.Users can create their models by graphic interfaces and get the exe-cutable program by transforming models to codes automatically.Our experiments on a series of typical graph algorithms have proved the correctness and effectiveness of our proposed programming model.(4)In order to use the unified programming model to design and implement a parallel solution for the community detection problem in the real world,after we analyse the background and meaning of a typical graph processing and analysis application,dynamic community detection,we propose a new approach based on information flows and also model a parallel version of the algorithm on our hybrid processing framework with the help of our proposed programming model.According to the comparison between processes of epidemic spreading and information flowing,we use epidemic dynamics model to evaluate the intimacy between entities.We add the information flowing property to the objective function of community detection,and enhance its hierarchy and time-sensitive.Also,we discuss the scalability problem of the centralized community detection algorithm,and model a parallel solution based on the DFA-G programming model.We use the modelling prototype to generate runnable programs on GraphHP to make the algorithm parallelized on distributed clusters.The experiments show its improvement on analysis results and how it enables us to analyse the evolution of the whole social network from a community perspective..
Keywords/Search Tags:graph data, iteration optimization, hybrid processing, graph pattern matching, unified programming model, dynamic community detection
PDF Full Text Request
Related items