Improved Method For Disk-based Large Scale Graph Computing Engine

Posted on:2016-05-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Jiang

Full Text:PDF

GTID:2308330476953460

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In this paper, we study on the performance of the graph computing engine. Many real objects, such as web graphs and social networks, can be converted into graphs and efficiently processed by various graph processing tools. GraphChi is the first reported disk-based graph engine that can handle billion-scale graphs on a single PC efficiently. It can achieve data processing performance close to and even better than those of mainstream distributed graph engines.In this paper we improve the performance by the following two parts:1) the memory utilization scheme and 2) the algorithm of the graph computing engine.In the memory utilization scheme, we are firstly motivated by the concepts of “pin” from TurboGraph and “ghost” from GraphLab and then propose a new memory utilization mode for GraphChi, which is called Part-in-memory mode, to improve the GraphChi engine performance. The main idea is to pin a fixed part of data inside the memory during the whole computing process. Part-in-memory mode is successfully implemented with only about 40 additional lines of code to the original GraphChi engine.Extensive experiments are performed with large real datasets(including Twitter graph with 1.4 billion edges). The preliminary results show that Part-in-memory mode memory management approach effectively reduces the GraphChi running time by up to 60% in PageRank algorithm.Interestingly it is found that a larger portion of data pinned in memory does not always lead to better performance in the case that the whole dataset cannot be fitted in memory. There exists an optimal portion of data which should be kept in the memory to achieve the best computationalperformance.Then we focus on the optimization algorithm of the graph computing engine. Since the Six Degrees of Separation shows that we can stop compute the less important vertices when they satisfy certain conditions,we do some experiments in big datasets to compare the optimal algorithm with the original algorithm and show the correctness and effectiveness of this optimization algorithm. Then we do several researches in the convergence value and the portion of the minor vertices and found the close relationship with the performance of the graph engine.

Keywords/Search Tags:

Graph process, Big data, GraphChi, Part-in-memory Mode, Six Degrees of Separation

PDF Full Text Request

Related items

1	Research On Routing Protocols Of Opportunistic Networks Based Six Degrees Of Separation
2	Design And Implementation Of Data Mining System On Telecom Billing
3	Based In The Six Degrees Of Separation And Centrality To Identify The Key Figure In The Micro-blog
4	A Study On The Network Topology Generation Algorithm Based On Six Degrees Of Separation Theory
5	Research On External Memory Based I/O Deduplication Optimized Multi-Task Graph Processing System
6	I/O Method Based On Active Vertex For Semi-external Memory Graph Computation System
7	Research On The Method Of Extracting And Analyzing The Memory Of Android Aplication Process
8	Multi-Task External Memory Graph Processing System Based On I/O Scheduling
9	Compiler Optimizations For Software-Managed On-Chip Memory
10	A ReRAM Based Graph Traversal System With Low Communication Cost