| Graph Neural Networks(GNN)is a generic term for algorithms that use deep neural networks to learn graph data.It combines graph broadcast operations with deep learning algorithms to allow graph structure information and vertex attribute information to participate in the learning.Graph neural networks are often used in distributed systems and large-scale data scenarios.However,the existing deep learning frameworks do not provide efficient storage support and message passing support for GNN training,which limits its usage on large-scale graph data.At present,many works have explored the design and implementation of largescale GNN systems based on the data characteristics of graph structure and the computational characteristics of GNN.Thesis introduces the work of the current GNN system,analyzes the system from many aspects,and uses part of the open source GNN system for experimental evaluation.Through the summary and experimental analysis of the current work,it is found that due to the dependence of the vertices in the calculation of the GNN,the small batch parallel calculation method adopted by the existing system has the problems of high memory complexity and accuracy loss caused by sampling.It affects the scalability of the system and the effectiveness of the model,while limiting the further development of GNN.In order to solve the above problems,thesis first briefly summarizes the development of GNN and summarizes the challenges that need to be faced in designing GNN systems.Through the analysis of the typical model of GNN,it is found that the calculation process of forward calculation and back propagation of GNN can be decomposed into DNN and graph propagation model.By recursively decomposing the calculation between vertices,the update of each vertex can be regarded as a process in which independent samples obtain neighbor vertex features through communication and aggregate them,thereby solving the dependency problem between vertices.A GNN full-batch gradient descent parallel calculation method is proposed in this paper,which uses the communication between vertices to resolve dependencies without neighbouring vertices sampling,which greatly reduces storage complexity,ensures model accuracy,and reduces redundant calculations.In this parallel computing method,communication between vertices needs to be carried out during the calculation process,and the current system for expressing neural network calculations based on calculation graphs does not support this communication operation.Therefore,thesis designs and implements a distributed graph neural network framework GFrame based on the GNN full-batch gradient descent parallel computing method.GFrame combines the necessary functions of tensor abstraction and automatic differentiation in the DNN framework to execute neural networks,the GNN model is expressed on the graph engine,and the optimized partition and communication of the graph engine are used to support efficient distributed GNN training.The framework mainly includes two modules of calculation and communication.GFrame is different from other systems.It uses a combination of graph engine and deep neural network framework.Therefore,according to the point-to-point communication method of the graph engine and the full-batch parallel computing method,thesis compares the The computing framework and communication framework are designed and implemented.In the calculation framework,the tensor calculation part is combined with the DNN framework to implement,and the distributed graph engine is combined to complete the graph propagation part.At the same time,the model is back-propagated for recursive decomposition,and the chain derivation rule is used for calculation.The communication framework includes the realization of parameter synchronization and graph engine communication to obtain characteristic data.Finally,GFrame is compared with existing open source frameworks,analysis and experiments can prove that the framework can achieve better results in performance,storage complexity and other aspects.In the case of a single machine,the training accuracy of the framework is basically the same as the original thesis of the model,and the single machine performance is better in the existing single machine system.Due to the design and optimization of the framework,the memory overhead on the dense graph is the smallest among all open source systems,and the largest memory consumption in the remaining systems is nearly 10 times that of the framework.At the same time,the accuracy error of the framework in a distributed situation is smaller than the open source system Euler,and the performance is better than AliGraph. |