Font Size: a A A

Research On Distributed Knowledge Representation Learning Framework Over Large-scale Knowledge Graphs

Posted on:2022-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:S C DongFull Text:PDF
GTID:2558307154474474Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Knowledge Representation Learning aims to embed entities and relations in knowledge graph into a continuous vector space,preserving the structural and semantic information of the knowledge graph while reducing computational complexity,thereby improving performance in downstream machine learning tasks.Traditional knowledge representation learning models show high accuracy on small benchmark datasets,but due to the limitation of storage space and computational performance of single machine,traditional models cannot be directly applied to large-scale knowledge graphs.Therefore,it is important to combine knowledge representation learning models and distributed systems to implement a distributed knowledge representation learning framework to support representation learning on large-scale knowledge graphs.Meanwhile,the existing knowledge representation learning models vary widely in their implementations,and there is lack of a unified distributed representation learning framework to facilitate knowledge representation learning task applications.In this paper,we present the work of this paper from two aspects,namely,the design of distributed representation learning framework and the optimizations based on key embeddings,respectively.Design of a distributed representation learning framework based on the parameter servers.The difficulties in the design of distributed representation learning framework are how to process training data and model parameters in distributed clusters and how to unify the implementation of existing knowledge representation learning models to form a template algorithm.In this paper,we design and implement PDKE,a distributed representation learning framework supporting large-scale knowledge graph training based on parameter servers,and adopt a template method design pattern to represent translation distance models as a unified abstract interface and implement a distributed training algorithm that can accommodate typical translation distance models.Optimization of distributed representation learning framework based on the key embedding.In the distributed training process of knowledge representation learning,when the size of the knowledge graph reaches tens of millions of nodes and hundreds of millions of edges,network communication gradually becomes a major bottleneck limiting the system performance.Based on the characteristics of sparse parameter updates during iterative computation,two optimization strategies,high-frequency parameter caching and partial stale algorithm,are proposed to exchange a small increase in computation time for a significant reduction in network communication time and total training time.In this paper,experiments are conducted on link prediction,a typical knowledge representation learning task,and the PDKE framework containing a distributed representation learning template algorithm and the HotKE framework based on key embedding optimization are evaluated on three knowledge graphs.The results show that these two distributed representation learning frameworks are able to achieve comparable accuracy to traditional KRL models,and support distributed training on large-scale knowledge graphs.The HotKE framework reduces the network communication time by up to76 % and the total training time by up to 73 % compared with the current state-of-the-art distributed representation learning systems without reducing the accuracy,which validates the accuracy,efficiency and scalability of the distributed representation learning framework and optimization strategies proposed in this paper.
Keywords/Search Tags:Knowledge Graph, Knowledge Representation Learning, Distributed System, Parameter Server
PDF Full Text Request
Related items