Cache-friendly Data Layout For Massive Graph

Posted on:2017-03-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Y Ou

Full Text:PDF

GTID:2348330503989807

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Large Scale Graph Processing performs many random reads and writes, and these workloads are typically performed in memory, moreover, a large number of random access can significantly reduce the performance of system, thus, it's crucial to improve the efficiency of memory access. Taking the modern storage tiering architecture into consideration, it will greatly improve the performance of access to memory if we can have more efficient use of Cache, whereas, there is no special method to organize the data in memory to optimize the Cache.To figure out this problem, we propose the Cache-friendly data layout strategies for different scenarios to improve the efficiency of the Graph engines. Combining with the locality of Cache and the adjacent table structure, We use sort to increase the order of the node number to improve the efficiency of Cache. Combining with the graph algorithms' typical operation: traversal and query, we use the classic traversal algorithm BFS(Breadth First Search) to increase the logical sequence of graph data to optimize the efficiency of Graph engines, in addition, the graph data, corresponding to squat BFS tree, have better overall order when layouted, and it can adapt to more algorithms. To sum the above analysis, we designed and implemented the graph preprocessing system CookGraph, and it matches the layout strategy according to the features of graph algorithms and graph datasets' type.We have three graph processing engines, four algorithms and six different types of graph datasets tested, and we find that there is only sort can improve the efficiency of the Cache when processing the CC(Connected Components) on SNAP(Stanford Network Analysis Platform), and the optimization ratio is 11% at least, 78.76% at most for the four ones optimized in six datasets. For GraphChi, BFS can improve the efficiency of the datasets corresponding to squat BFS tree when three algorithms CC, TC(Triangle Counting) and PageRank processed, and the ratios are 15%-20%, 11% and 20%, and BFS can only increase the CC's efficiency when testing five datasets corresponding to slender BFS tree, and the optimal ratio is 50.1% among the four improved datasets.

Keywords/Search Tags:

Large Scale Graph Processing, Graph Algorithms, Graph Datasets, Data Layout, Cache-friendly

PDF Full Text Request

Related items

1	External Storage Layout Optimization For Massive Graph Traversal
2	Research And Implementation Of Graph Mining Platform Based On Pregel-like Framework
3	Research On Key Technologies Of Large-scale Graph Processing
4	Research On The Communication And Cache Techniques In BSP-Based Large Scale Graph Processing Systems
5	Research And Implementation Of Virtual Machine Management System For Large Scale Graph Processing
6	Research On Key Technologies Of Graph-Based Large-Scale Log Processing System
7	Research On The Large-scale Evolution Graph Data Partition Method Based On Vertex-Cut
8	Research On Large Graph Aggregation Algorithm Based On Finite Memory
9	Improving The Performance Of OLAP In Large-Scale Graph Processing System
10	A Research On Graph Computation Platform In A Single PC By Application Of Graph Algorithm Tuned Graph Representation Format