Font Size: a A A

Research On Training Acceleration Of Graph Neural Network For GPU Heterogeneous Environment Technology

Posted on:2023-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z J RanFull Text:PDF
GTID:2568307169483094Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Graph neural network has become an important tool to study graph structure data and has been widely used in social networks,citation networks,recommendation system,crowdsourcing system,and other fields.Under the condition that the GPU memory capacity is much smaller than the graph data scale,the traditional training model of full graph computing can no longer meet the computing demand of the graph data scale skyrocketing,so it is still challenging to carry out heterogeneous training on the large-scale graph.The mainstream graph-neural network training system uses sampling to reduce computation and video memory usage,but it still faces great challenges in data loading,training duration,hardware resource utilization,and other aspects.On the one hand,frequent data transmission and communication by CPU-GPU during training will reduce GPU utilization and prolong training time.On the other hand,increasing the amount of training data will cause excessive CPU load,which will affect the throughput of model training.Therefore,aiming at data transmission efficiency,video memory occupation,and other problems,this paper carries out research on graph neural network training acceleration technology.The main work and innovation points are as follows:Aiming at the problems of too much data transmission pressure and too much static cache memory,a new data transmission mode(BRGraph)is proposed in this paper.In this method,duplicate node data is detected in the host memory in advance to reduce redundant data transmission,and the original data space is reused to reduce video memory occupation.Compared with the mainstream graph deep learning framework DGL,the proposed method can reduce the data transmission by up to 60%,achieve 1.79 times of end-to-end training acceleration,and reduce the memory usage by 19%to 40%compared with the static cache scheme,avoiding the problem that the data hit ratio decreases with the increase of data volume.This paper also designs a GPU-based parallel detection algorithm,PBR,which uses the high parallelism of GPU to calculate the matching rules between subgraph arrays,greatly reducing the time cost introduced by the BRGraph scheme,and effectively improving the GPU utilization.Experimental results show that the PBR algorithm is 4-11 times faster than the Numpy library,and is suitable for large data application scenarios.Aiming at the technical obstacles such as the low hit ratio of existing static data cache schemes and weak integration of transmission acceleration schemes,a new cache scheme based on a reachable subgraph is proposed in this paper.In this method,a reachable subgraph can be established according to data set partition and redundant nodes can be removed to avoid cache content invalidity,and the reachable subgraph can be coupled with the BRGraph mixed multiplexing transmission scheme(S-BRGRAPH).Compared with the mainstream static cache scheme,the reachable subgraph can automatically adapt to the subgraph and improve the data hit ratio of 2%to 7%,and when combined with SBRgraph,the data hit ratio can be improved by at least 12%,thus improving the transmission efficiency of 23%,enhancing the integration of the transmission and the adjustability of idle resources.
Keywords/Search Tags:Graph Neural Network, Heterogeneous Training, Data Transmission, GPU Memory Usage
PDF Full Text Request
Related items