Font Size: a A A

Research And Implementation Of Updating Knowledge Graph Of Vertical Domain Based On Prior Knowledge

Posted on:2022-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:M Y QinFull Text:PDF
GTID:2518306536967789Subject:Engineering
Abstract/Summary:PDF Full Text Request
Knowledge graphs(KGs)are playing an increasingly important role in many real-world applications.The conciseness and integrality of Knowledge graphs is essential for the variety of natural language processing(NLP)tasks.However,outdated knowledge in the knowledge graph can significantly degrade the quality of the knowledge graph with the continuous generation of new data.For vertical domain knowledge graphs,timely extraction of relationships between new entities from newly released domain normative documents is required to provide reliable data support for downstream tasks.In order to ensure the real-time performance of the vertical domain knowledge graph and reduce the time and economic cost of manual labeling,this thesis designs and implements a pipeline update algorithm for the vertical domain knowledge graph.The main research contents of this thesis are as follows:(1)A pipelined updating algorithm for vertical domain knowledge graphs is proposed.The existing knowledge graph update strategies require a large amount of annotated data to improve the model performance.However,only a small amount of annotated data is available for most vertical domains.To deal with the frequently released domain normative documents,manual annotation by experts is required,which is inefficient and time-consuming.This pipeline method that incorporates prior knowledge first identifies new entities by the Dictionary Vocabulary and Bayesian Sets jointly(DVBS),and then extracts relations involving new entities by the BERT-based Bi-GRU with Dual Attention mechanism(BBGDA)model,thereby realizing dynamically updating of the KG.(2)The DVBS method is proposed for vertical domain named entity recognition.The existing generic domain word segmentation tools have excessively fine granularity for vertical domain text segmentation,which impairs the effectiveness of vertical domain named entity recognition.According to the characteristics that named entities in domain normative documents are often compound words,the domain term candidate set is obtained first,and the words in the candidate set are further added to the domain named entity dictionary according to the categories.(3)The BBGDA relation extraction model fused with attention mechanism is proposed.Vocabulary coding is based on the BERT model,and its self-attention mechanism allows interaction between inputs,making the semantic coding of vocabulary more in line with the context.The word-level attention mechanism focuses on the semantic information that plays a role in relation classification.The sentence-level attention mechanism reduces the noise generated during distant supervision.This thesis aims to achieve the incremental update of the data layer of the vertical domain knowledge graph for the data characteristics of domain normative documents,and ensure the real-time of the vertical domain knowledge graph.In this way,high quality data is provided for the downstream tasks of the vertical domain knowledge graph.
Keywords/Search Tags:Entity recognition, Relationship extraction, Knowledge graph, Prior knowledge
PDF Full Text Request
Related items