Font Size: a A A

Research On Knowledge Graph Construction Technology Based On Semi-Supervised Learning

Posted on:2024-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:M XiaFull Text:PDF
GTID:2568306944462654Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The current era is the era of rapid development of the Internet,and the network data has a blowout growth.Under this background,the knowledge graph emerges.Large-scale knowledge graph,such as Google Knowledge Graph,DBpedia and Baidu Knowledge Graph,stores the facts of the real world in the form of triplet<head entity,relation and tail entity>.The triplet is abbreviated to<h,r,t>,where the relation represents the relationship between the head and tail entities.Knowledge graph supports many intelligent applications such as web search and question and answer.Therefore,the construction of knowledge map has been a topic of wide concern in academia and industry.Entity and relation extraction and knowledge fusion are the core tasks of knowledge graph construction.Entity and relation extraction task refers to the task of identifying entities based on the semantic information of a given sentence and predicting the relationship between entities.Knowledge fusion task refers to the fusion of two knowledge maps obtained from different data sources.Its essence is to study how to disambiguate different descriptions of the same concept from multiple different data sources.Most of the existing entity and relation extraction and knowledge fusion algorithms are targeted at supervised scenes and require a large amount of annotation corpus for model training,which means high manual annotation cost,while semi-supervised learning technology can effectively balance the labor cost and algorithm effect.The iteration-based bootstrap method uses a small amount of annotated information as seeds to carry out multiple iterations to expand new information.Therefore,this kind of method is suitable for semi-supervised scenarios.However,this kind of method has the problem of "drift",that is,if the error information is introduced in the previous iterations,the error will be further amplified in the later iterations.The iteration direction drifts from the initial annotation information direction.Reinforcement learning technology aims to maximize the final expected reward value for learning.In an iterative process,the best quality of the final iteration results can guide the addition of new information in each iteration process,so as to solve the "drift" problem existing in iterative methods.This thesis designs and implements a semi-supervised knowledge graph construction technology,including semi-supervised entity and relation extraction and knowledge fusion algorithm based on reinforcement learning.For entity and relationship extraction,the existing work in supervised scenarios relies on a large number of manual annotation corpus training models,while the work in semi-supervised scenarios is a pipeline form of entity extraction followed by relationship extraction,which has two problems:error accumulation and insufficient information exchange.To solve these problems,in this thesis,a semi-supervised entity and relation extraction algorithm based on reinforcement learning is proposed.In this algorithm,in order to solve the problem that there is only a small amount of entity and relationship annotation information,an iterative entity and relationship joint extraction method is proposed.In order to better evaluate the quality of joint extraction model,a confidence evaluation method of joint extraction model was proposed.In order to solve the semantic offset problem in the iterative process,a semantic offset control method based on reinforcement learning was proposed.Experiments on the open benchmark dataset AFP_APW show that compared with the existing work,the entity and relation extraction algorithm proposed in this thesis achieves better results on P,R and F1.For knowledge fusion,the existing embedded-based knowledge fusion methods treat all entity nodes equally when representing the knowledge graph,without considering the distinction between the known and unknown entities that can be fused within the graph.However,the known fused entities often have richer guidance information for the knowledge fusion task.To solve the problem mentioned above,this thesis proposes a semi-supervised knowledge fusion algorithm based on reinforcement learning.In this algorithm,a knowledge graph representation method based on attention mechanism is proposed to distinguish the different types of known and unknown fusion neighbors that treat candidate entities.In order to integrate knowledge graph representation information obtained from different angles,a multi-angle knowledge graph representation integration method based on gating mechanism was proposed.In order to solve the fusion migration problem in iterative fusion process,a fusion drift control method based on reinforcement learning is proposed.Experiments on public benchmark datasets EN-FR,EN-DE,D-W and D-Y show that the knowledge fusion algorithm proposed in this thesis can achieve better results on Hits@1,Hits@5 and MRR.
Keywords/Search Tags:knowledge graph, semi-supervised learning, entity and relation extraction, knowledge fusion, reinforcement learning
PDF Full Text Request
Related items