Font Size: a A A

Research On Heuristic Partition And Management Of Large Graph Data On BC-BSP System

Posted on:2015-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2308330482457085Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of society, data sets become bigger and bigger, such as facebook, weibo, renren and other social network and innovation gene sequence, etc. Traditional graph processing system in dealing with these based on the calculation on the large data set has obvious short, therefore, we need to develop a highly efficient and stable mass graph data processing system that is used for calculation. Graph data classification is one of the important problems that base on a BSP programming model of large-scale processing system need to be solved. Especially in cloud computing environment, due to the large data size, more graph data need to be divided into multiple partitions, and parallel processing on the cluster computing nodes. However, most existing methods of graph partition need multiple iterations, time complexity is much too high, and the result does not retain the map from vertices to the partition, which is not suitable for BSP model. In addition, in practical application, due to the graph data is basic fixed, repeated graph partition operation reduces the system efficiency. Therefore, how to achieve rapid division as well as how to manage partition has a great challenge.Therefore, we developed BC-BSP system for large computing image processing, which is based on Hadoop cloud computing programming model and the BSP model. In this thesis, we design and implement the system of data classification under the module of data partitioning, the main contributions are as follows:(1) Designed and released DHP, C-DHP graph partition algorithm that based on heuristic rules. In these two kinds of algorithm, we introduced the concept of vertex earnings. The former introduced the concept of virtual barrels, twice applicating vertex income strategies, retains the original data of local topology. The latter for DHP’s optimized:first clustering figure data, then using vertex placed strategy, at last doing the online merged, which provides support for dividing. (2) To manage the partitions, lay the foundation for mergering partitions and calculation. (3) Put forward the idea of divided one time and used multiple times and design the online combiming algorithm. We need to get assignment task when system startup, and merge partitions online, to achieve the purpose. (4) By rewriting class InputFormat, we integrate the above algorithms into BC-BSP system and realized making multiple files as one input split.Though the extensive experiments, the partition algorithm and the online combining proposed in this thesis can finish the function of graph partition of BC-BSP system.The experiment indicates that the cut in C-DHP reduced 25% than Hash, the running time was faster about 40%.
Keywords/Search Tags:graph partition, heuristic, big graph parallel system, partition management, BC-BSP
PDF Full Text Request
Related items