| Graph is a simple and intuitive way of data representation.Lots of real-world problems can be transformed into graphs,e.g.user relationships in social networks and ratings in e-commerce applications.The systems running on graph data are called graph computing systems and can be divided into two types: distributed and single-PC.By increasing the number of machines,distributed graph computing systems can achieve the goal of data decentralization and workload balance,mostly used in large-scale data scenarios;Single-PC graph computing systems can be subdivided into single-PC in-memory and single-PC out-of-core,used to handle data of small and large scale respectively.To improve the concurrency,the systems of this type generally allocate the corresponding number of threads according to server performance,and out-of-core techniques are also applied in out-of-core systems for I/O loading.Under the same resource constraints,single-PC systems tend to outperform distributed systems,mainly due to the higher cost of communication and synchronization in distributed environment.By increasing cheap external storages and optimizing parallel I/O,single-PC out-of-core systems can achieve good performance when handling large-scale data,without wasting expensive distributed resources.Currently,there have been some researches in the area of single-PC out-of-core systems,e.g.GraphChi,X-stream,and GridGraph.They are all dedicated to the external-memory access optimizations for better I/O performance,e.g.X-stream and GridGraph reduce I / O latency by accessing external-memory sequentially,GridGraph reduce the times of I/O access by decreasing the data size of I/O loading and submitting I/O requests in a coarse-grained way.However,they all ignore the optimizations to the in-memory access efficiency.With the improvement of I/O performance,the in-memory access time will gradually become prominent,and even rise to the bottleneck,and then these three systems will no longer dominate in performance.This paper proposes a single-PC graph computing system oriented to large-scale data,using parallel I/O techniques.To solve the problems above,the new system optimizes the coordination between the process of computation and I/O loading and memory access.The asynchronous computation-I/O loading model,which is proposed in the paper,parallelize the process of computation and I/O loading,rather than the serial execution in the synchronization model,which is used by GridGraph.In the asynchronous model,the computation and I/O loading are independent,so different thread groups can be allocated according to the access requirements of the tasks they execute respectively,avoiding the excessive I/O threads caused by the unified allocation strategy in the synchronous model,to achieve full utilization of memory and I/O bandwidth;Moreover,The LIBAIO which is an asynchronous I/O engine is also used in the system to achieve higher I/O access efficiency and balance the workload among different executing threads;To further optimize the memory access in NUMA architecture,the strategy of sequential-remote and random-local access is proposed in the paper,which improves the memory access efficiency a lot by avoiding random access to the remote memory.In order to ensure the sequential access to the remote data,the system needs to rearrange the graph input according to the source or destination vertex in the preprocessing phase.To lower the sorting overhead,the paper proposes the strategy of in-memory sorting and external merge sort.Through the experimental results,there is no doubt that the system proposed in the paper has significant advantages in both I/O performance and memory access efficiency. |