As a powerful graph analysis tool,deep learning on graphs has been widely applied to solve various real-world problems with great success.How to efficiently train large-scale graph models with the rapid growth of graph data has become a hot research topic in both academia and industry.Existing large-scale graph model training systems suffer from challenges including low graph sampling efficiency,expensive hardware configuration requirements,limited training scalability,etc.To address these challenges,this thesis focuses on training large-scale graph models on a single machine with multiple GPUs.It conducts in-depth research on GPU-based graph sampling on large graphs,subgraph generation for largescale heterogeneous graphs on multiple GPUs,and large-scale graph model training based on subgraph to design and implement a large-scale graph model training with GPU-based graph sampling.Specifically,the following works are completed:1)To address the challenges of large memory footprint and expensive transfer cost in GPU-based graph sampling,this thesis proposes the Chunkbased Graph Compression(CGC)algorithm.CGC takes a novel linear estimation compression algorithm and a hybrid coding method to achieve the balance between compression rate and decompression performance.Experiments and theoretical analysis show that CGC can effectively reduce graph file size and transfer cost while supporting fast access to compressed neighbor lists at constant compute complexity.2)Based on the CGC algorithm,this thesis implements a compressed graph fast sampling system,GraSS.It adopts the transit-parallel method and implements various graph sampling methods on compressed graphs with a single GPU,thereby improving the end-to-end performance of graph learning.The empirical results on several real-world and synthetic graphs demonstrate that GraSS can support various graph sampling methods on large graphs with high efficiency when the state-of-the-art solutions cannot complete the sampling task.3)When extended to a single machine with multiple GPUs,this thesis designs and implements a computing subgraph generation technique on a multi-GPU system to deal with the graph with larger size and more complex structures.It adopts the sharding strategy to improve the parallel performance of subgraph generation,and efficiently completes the twostage subgraph generation tasks including graph partitioning and graph sampling.The effectiveness of this technical solution is verified on a realworld large-scale heterogeneous graph with billions of vertices and edges,which provides training data to support the large-scale graph model training systems with GPU-based graph sampling.4)This thesis designs and implements a large-scale model training system with GPU-based graph sampling.Based on the subgraph training strategy,this system is able to train graph models on large-scale heterogeneous graphs on a single machine with multiple GPUs.It lessens the precision decreasing caused by subgraph training by the random subgraph combination strategy.Experiments on a large-scale graph show that the system can efficiently complete the graph model training when half of the GPU computing resources are reduced. |