Font Size: a A A

External Storage Layout Optimization For Massive Graph Traversal

Posted on:2016-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:L HuangFull Text:PDF
GTID:2348330479453374Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
When processing large-scale graph data in external storage mode in graph process systems, due to the randomness of application access and the low locality of graph data, it causes a large number of random I/O requests,which affects the cache hit rate and weakens the performance of I/O. Therefore, how to optimize the I/O performance of the graph processing systems in external storage mode is the key to improve the efficiency of graph processing.By analyzing and comparing in I/O patterns of graph process systems,including X-Stream, GraphChi and PrefEdge, we discuss the advantages and limitations of each scheme. Without changing the graph processing model or user operation complexity, we propose the I/O optimization scheme of graph data layout based on the logic relevance of graph structure, and design a method named BOGL,mapping graph node number in BFS order to improving the sequence and locality of graph traversal access. We establish the BFS access tree of BOGL, give the adjacency matrix of graphs, and analyze the average sequential I/O requests and the distribution of non-zero elements of the adjacency matrix of graphs before and after the reorganization. It demonstrates that the BOGL algorithm improves the sequence and locality of graph traversal. We develop a processing engine named BOGLE, including the preprocessing module, loading module, calculating module and prefetching module. Compared to X-Stream and GraphChi, BOGLE can be combined with the existing graph processing system,without changing the original graph programming model or increasing the operation complexity; compared to PrefEdge,which designed specifically for process optimization of cache, the use of BOGLE can further improve the efficiency of the I/O.By reorganizing the graph data from different types and sizes,we compare the performance of application before and after reorgnization, and verify the validity of the BOGL strategy. We use BOGLE in SimPrefEdge and GraphChi, for high density, high average degree data sets, the I/O performance of graph traversal has been improved, thus improving the graph processing efficiency.
Keywords/Search Tags:External Mode Graph Processing System, Large-Scale Graph Traversal, Data Layout, I/O Optimization
PDF Full Text Request
Related items