| In recent years,knowledge graph has developed rapidly and has been widely used in recommendation systems,question and answer,intelligent search,etc.However,traditional knowledge graph representation methods have problems of low computational efficiency and data sparsity,which seriously restrict the inference,query and applications of knowledge graphs.The knowledge graph embedding represents the entities and relationships in the graph as low-dimensional dense vectors,which enables efficient semantic computation and plays an extremely important role in the application of knowledge graphs.However,most of the previous research on knowledge graph embedding has focused on the construction of embedding models,and its implementation and application are only for some small-scale datasets,and these methods cannot run on a single computing device when facing large-scale knowledge graphs of billion-scale.Therefore,parallel training algorithms using multiple computing devices become an inevitable choice for training large-scale knowledge graph embeddings.There are still many problems in existing training frameworks for knowledge graph embedding.In this paper,we investigate the training of embedding models under single-computer multi-GPU platforms for a graph partitioning-based training framework for knowledge graph embedding.The main work and innovation points are as follows.In the existing training framework of graph partitioning-based knowledge graph embedding,there are problems in multi-GPU scheduling.First,PBG needs to adjust smaller partition settings to meet the constraints imposed by the scheduling algorithm and GPU memory,while using smaller partitions poses problem in terms of IO.Second,when scheduling multiple GPUs for training,the uneven partitioning of training data and differences in GPU computing speed can cause GPU load imbalance and reduce GPU utilization.To solve the above problems,we optimize the two-stage partitioning and propose a balanced partitioning method and a dynamic scheduling method to improve the training efficiency.The main ideas are as follows: to use a larger partition with the same GPU memory limitation,we improve the GPU scheduling algorithm and optimize the two-stage partitioning so that the algorithm can use a larger number of sub-partitions for training,thus not increasing the size of sub-partitions when using larger partitions; to make the number of training data balanced across GPUs,the balanced partitioning method divides the graph data based on the node degree to achieve a balanced partitioning of data while ensuring the randomness of training data; a dynamic scheduling method is proposed to solve the problem of GPU computing speed difference,which can adjust the task load on different GPUs more flexibly.Compared with PBG,the above optimization method can shorten the training time by about 30% on the Freebase86 m dataset.Although the above optimization methods can effectively alleviate the IO overhead and model convergence problems caused by graph partitioning,they do not eliminate the impact of graph partitioning.In order to further eliminate the IO overhead and model convergence problems,we propose a fine-grained graph partitioning knowledge graph embedding framework.The main ideas are as follows: to address the problem of inefficient utilization of hardware resources by the serial method in the training of graph partitionbased knowledge graph embedding,we propose a pipeline-based data IO method,which implements IO and GPU training in parallel by pipelining,and masks the IO overhead in GPU computation; to address the model convergence problem brought by the increase of the number of partitions,we propose a random partition recombination method,which obtains new partitions by random.In order to combine the above methods,we adopt a modular design to realize the IO of random partitions between disk and memory in the IO module,and the recombination of random partitions and the transfer between memory and GPUs in the training module.By combining the two modules,we implement a knowledge graph embedding framework based on fine-grained graph partitioning,which can reduce the training time by 32% and improve the model accuracy by 3.7% compared to the PBG method. |