Font Size: a A A

Improved Method For Disk-based Large Scale Graph Computing Engine

Posted on:2016-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y F JiangFull Text:PDF
GTID:2308330476953460Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In this paper, we study on the performance of the graph computing engine. Many real objects, such as web graphs and social networks, can be converted into graphs and efficiently processed by various graph processing tools. GraphChi is the first reported disk-based graph engine that can handle billion-scale graphs on a single PC efficiently. It can achieve data processing performance close to and even better than those of mainstream distributed graph engines.In this paper we improve the performance by the following two parts:1) the memory utilization scheme and 2) the algorithm of the graph computing engine.In the memory utilization scheme, we are firstly motivated by the concepts of “pin” from TurboGraph and “ghost” from GraphLab and then propose a new memory utilization mode for GraphChi, which is called Part-in-memory mode, to improve the GraphChi engine performance. The main idea is to pin a fixed part of data inside the memory during the whole computing process. Part-in-memory mode is successfully implemented with only about 40 additional lines of code to the original GraphChi engine.Extensive experiments are performed with large real datasets(including Twitter graph with 1.4 billion edges). The preliminary results show that Part-in-memory mode memory management approach effectively reduces the GraphChi running time by up to 60% in PageRank algorithm.Interestingly it is found that a larger portion of data pinned in memory does not always lead to better performance in the case that the whole dataset cannot be fitted in memory. There exists an optimal portion of data which should be kept in the memory to achieve the best computationalperformance.Then we focus on the optimization algorithm of the graph computing engine. Since the Six Degrees of Separation shows that we can stop compute the less important vertices when they satisfy certain conditions,we do some experiments in big datasets to compare the optimal algorithm with the original algorithm and show the correctness and effectiveness of this optimization algorithm. Then we do several researches in the convergence value and the portion of the minor vertices and found the close relationship with the performance of the graph engine.
Keywords/Search Tags:Graph process, Big data, GraphChi, Part-in-memory Mode, Six Degrees of Separation
PDF Full Text Request
Related items