Font Size: a A A

Research On And Application Of Efficient Algorithms For Large-Scale Static Graph Partition

Posted on:2020-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z G ZhongFull Text:PDF
GTID:2428330602951881Subject:Engineering
Abstract/Summary:PDF Full Text Request
The social network,web pages' linkage,network of urban road and large-scale integrated circuit are all regarded as a large-scale graph,which generally have over ten million vertices and/or edges i.e.,a kind of big data.The key basis and precondition on these large-scale graphs distributed computing/analysis must evenly divided them in k partitions.Thus,we can take full advantages of cluster resources and improve efficiency of computing/analysis these graphs.Though there are many large-scale graph partition algorithms with the linear programming,multi-level partition,heuristic method,etc.,respectively,the existing algorithms have some rooms to improve their performance and/or accuracy.What is important is that existing algorithms have no consideration to the large-scale graph's structure leading in the poor algorithms' performance and partition quality.Therefore,in the age of big data,the innovative research on and implementation of efficient large-scale graph partitioning algorithms have very great theoretical and practical significance.The thesis stems from the National Natural Science Foundation of China.Aiming to overcome the above weaknesses of existing algorithms,based on proposed a new dynamic label propagation algorithm,the novel multi-objective optimization model and solution of the large-scale graph partition,consideration to the graph structure,etc.,two new efficient and effective large-scale graph partition algorithms are proposed.Meanwhile,author embedded one of the proposed graph partition algorithms into popular graph processing framework Giraph resulting in a new and enhanced large-scale graph processing framework.More specially,the main work and innovation of the thesis are as follows:(1)To overcome the existing algorithms weaknesses,based on proposed new lightweight label propagation strategy,a new efficient large-scale graph partition algorithm BGPLP(Balanced Graph Partition based on Label Propagation)is proposed.More precisely,the author first transforms the large-scale graph partitioning problem into multi-objective optimization one.Then,both the load balance and minimizing communication overhead among the graph's partitions are simultaneously considered during the iterative process of problem solving to efficiently achieve bi-objective optimization of the problem.Meanwhile,a novel dynamically updating vertex's label method is introduced,which can be adaptive to the current graph partition so as to speed up each iteration.And a new algorithm's convergence criterion with two conditions is presented for the algorithm converging more accurately and rapidly.On a large number of large-scale graph datasets,the proposed large-scale graph partition algorithm BGPLP is fully tested and verified on the datasets.The experimental results show that our algorithm BGPLP is much better than the-state-of-art large-scale static graph partitioning algorithms in efficiency and accuracy.(2)To avoid existing algorithms being prone to fall into the local optimization,with the random perturbation and other strategies,an improved multi-phase large-scale graph partition algorithm GPMP is proposed based on our algorithm BGPLP.That is,GPMP first takes advantage of local information of the large-scale graph structure to improve result quality of initial partition.Then,in the iteration process,GPMP employs proposed multi-phase partition strategy,i.e.,the optimizing phase and the escaping local optimization phase,to escaping local optimization and acquiring the optimal graph partition efficiently and effectively.In the optimizing phase,GPMP optimizes quickly result of the large-scale graph partitioning.Then,in the escaping local optimization phase,GPMP randomly disturbs the partitioning result to increase probability of algorithm jumping out of local optimization.These two phases are executed alternately to gradually increase the global optimization capability of GPMP.Moreover,the new method of reserving intermediate optimal partitioning results is introduced by GPMP,which can speed up the partitioning and refrain from the fluctuation of partitioning quality.Finally,our algorithms GPMP and BGPLP are fully test and verified on different large-scale graph datasets.The experimental results indicate that GPMP outperforms better than BGPLP in the efficiency and accuracy.(3)For optimizing the popular graph processing framework Giraph with the traditional low-efficiency partition method,a new enhanced graph processing framework EGiraph is constructed by repleading the default hash graph partition algorithm embedded in Giraph with our GPMP under following the Giraph interfaces.Then,Page Rank and graph's connected components computations are worked on the EGiraph to verify its performance.The two experiments indicate that GPMP algorithm can remarkably reduce runtime of tasks of graph computation and EGiraph has been greatly enhanced in terms of time efficiency.The further improvement and extension of our algorithm GPMP are our future research directions.A new and enhanced GPMP with the ability of well coping with large-scale power law graph and dynamic graph is our final ends.
Keywords/Search Tags:Large-Scale Graph Partition, Label Propagation, Multi-Objective Optimization, Falling Into Local Optimum, Multi-Phase Optimization, Giraph
PDF Full Text Request
Related items