Font Size: a A A

Cache-friendly Data Layout For Massive Graph

Posted on:2017-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y M Y OuFull Text:PDF
GTID:2348330503989807Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Large Scale Graph Processing performs many random reads and writes, and these workloads are typically performed in memory, moreover, a large number of random access can significantly reduce the performance of system, thus, it's crucial to improve the efficiency of memory access. Taking the modern storage tiering architecture into consideration, it will greatly improve the performance of access to memory if we can have more efficient use of Cache, whereas, there is no special method to organize the data in memory to optimize the Cache.To figure out this problem, we propose the Cache-friendly data layout strategies for different scenarios to improve the efficiency of the Graph engines. Combining with the locality of Cache and the adjacent table structure, We use sort to increase the order of the node number to improve the efficiency of Cache. Combining with the graph algorithms' typical operation: traversal and query, we use the classic traversal algorithm BFS(Breadth First Search) to increase the logical sequence of graph data to optimize the efficiency of Graph engines, in addition, the graph data, corresponding to squat BFS tree, have better overall order when layouted, and it can adapt to more algorithms. To sum the above analysis, we designed and implemented the graph preprocessing system CookGraph, and it matches the layout strategy according to the features of graph algorithms and graph datasets' type.We have three graph processing engines, four algorithms and six different types of graph datasets tested, and we find that there is only sort can improve the efficiency of the Cache when processing the CC(Connected Components) on SNAP(Stanford Network Analysis Platform), and the optimization ratio is 11% at least, 78.76% at most for the four ones optimized in six datasets. For GraphChi, BFS can improve the efficiency of the datasets corresponding to squat BFS tree when three algorithms CC, TC(Triangle Counting) and PageRank processed, and the ratios are 15%-20%, 11% and 20%, and BFS can only increase the CC's efficiency when testing five datasets corresponding to slender BFS tree, and the optimal ratio is 50.1% among the four improved datasets.
Keywords/Search Tags:Large Scale Graph Processing, Graph Algorithms, Graph Datasets, Data Layout, Cache-friendly
PDF Full Text Request
Related items