Font Size: a A A

Research On Balanced Graph Partitioning Based On Vertex Cut And Its Implementation On Spark

Posted on:2017-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:W J LiuFull Text:PDF
GTID:2308330503961488Subject:computer science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, with the explosive growth of data, the research of the big data processing platform has arisen, and the emergence of Hadoop makes people begin to study Mapreduce computing model from the traditional concurrent computing model. In order to meet the computational requirements of data mining and interactive real-time query, to further improve the ability and efficiency of data processing, UC Berkeley University High Performance Computing Laboratory developed Spark computing platform, Spark introduces RDD(resilient distributed datasets) data model, greatly improving the calculation speed of interactive computing and iteration calculation, it makes Spark become a useful tool for large data processing.Graph data processing is one of the cores of data processing, balanced graph partition is an important part of graph data processing. It is well known that balanced graph partitioning is an NP complete problem, has a very wide applications, including biological networks, large scale integrated circuit design, parallel programming, load balancing and online social network analysis. Many of the traditional graph partitioning algorithms, belong to the global algorithm, namely, the algorithm implementation needs to traverse graph, relying too much on the whole graph knowledge, so they are not suitable for processing of large graph data. 2013 Rahimian proposed a decentralized local search algorithm JA-BE-JA, without central coordination, only needs local knowledge of graph partition, greatly reduces the communication overhead of the nodes in the cluster, greatly improvs the scalability and large graph data processing ability of the algorithm.However, the JA-BE-JA algorithm was still designed based on edge cut, the recent study found that for natural graph processing, While the edge cut ensures the balance of the number of nodes in each subset, but the number of each subset concentrated edge is very different, and the algorithm is easy to fall into the local optimal solution after many iterations. So in this paper, we propose the algorithm JA-BE-JA-VC based on vertex cut, the new algorithm uses the idea of split points, subset of the balance is maintained by the number of edges in each partition, rather than simply the node numbers as the standard of balance. In the division of the natural map, the partitions generated by the new algorithm are more balanced. At the same time, the simulated annealing technique is introduced to increase the probability of the algorithm and avoid the local optimal trap. Finally, the algorithm was assessed and the results show that the new algorithm can not only maintain the balance area scale but also the number of segmentation of point is relatively low, especially for the real data which obeys the power law distribution, the design idea of algorithm based vertex cut has a very important reference significance.
Keywords/Search Tags:Big Data, Graph Partitioning, Distributed Computing, Spark, Vertex-Cut
PDF Full Text Request
Related items