Font Size: a A A

Research On Adaptive Sampling Algorithm For Knowledge Graph Embedding

Posted on:2021-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:C MaFull Text:PDF
GTID:2428330629452679Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,the construction and application of knowledge graph have grown rapidly.A knowledge graph is a network whose nodes are entities in the real world and whose edges are relationships between entities,represented as a triple(head entity,relationship,tail entity).The structural system of human knowledge is constructed by the network,which consists of a large number of facts in the form of triples.We have created a large number of knowledge graphs,such as YAGO,NELL,and Freebase.And these knowledge graphs have been successfully applied in many fields,from semantic parsing and disambiguation of named entities to information extraction and question answering.Although this triplet effectively represents structured data,it is difficult to express the potential properties of triples.Therefore,it has been proposed that a new research direction is called knowledge graph embedding.The key idea is the knowledge graph(including entities and relationships)to embed the components,into a continuous vector space in order to simplify the operation while retaining the inherent structure of the knowledge graph.The entity and relationship embedding can be further used to complete various tasks,such as knowledge graph completion,relationship extraction,entity classification,and entity resolution.In these models,different methods of scoring triples are tried.The goal of these models is to distinguish between positive and negative sampling.Sampling training data is very important for faster and better convergence of the embedded model.Currently,negative sampling has attracted much attention.In order to reduce invalid negative sampling during training,different probabilities of replacing head or tail entities are set in the translation-based model.The impact of the number of negative sampling per positive training sample has been explored.The type information of the entity is also considered in the reconstruction of the triples.In addition,a negative sampling based on GAN is proposed to deal with the zero loss problem.However,the positive sampling strategy has not been proposed.Moreover,the positive sampling is also essential for further research on various embeddedmodels.Therefore,in order to efficiently find those data points that may not be well trained,this paper introduces an adaptive sampling method by grouping data.The adaptive sampling training data method groups training data according to certain rules.Firstly,group training data according to a certain rule.And then the data is randomly selected in the group,which can balance the problem of time cost and training data quality.In particular,this paper uses relationships as the basis for grouping,which can simply and effectively divide data.At the same time,in order to adapt to the dynamic changes of insufficient training data during the training process,this paper proposes to adjust the probability of each group being selected adaptively,which can automatically make the training process more convergent,thereby improving the efficiency of the training model.In addition,it is difficult to accurately assess the training level of each set of data.This paper uses the average loss of the previous round of each set of data as an approximate estimation of the training level of the set of data.And,in order to avoid the bias of training degree evaluation caused by the "zero loss" problem caused by the randomness of counter examples,a "non-zero loss" mechanism is added in this paper.The results show that the adaptive group sampling achieves better results on the link prediction task and can make the embedded model converge faster and better.
Keywords/Search Tags:Knowledge Graph, Knowledge Graph Embedding, Translation-based Embedding Model, Adaptive Sampling, Link Prediction
PDF Full Text Request
Related items