Font Size: a A A

Construction Of COVID-19 Domain Knowledge Graph Based On Pre-training Language Model

Posted on:2022-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:J H LiFull Text:PDF
GTID:2518306764980019Subject:Journalism and Media
Abstract/Summary:PDF Full Text Request
Since Google first proposed the concept of knowledge graph in 2012,knowledge graph has become one of the hot research directions in the field of natural language processing,and its applications are emerging one after another.Today's era is the era of big data,and the data processing and data application in various industries must be inseparable from the knowledge graph.However,there will be different knowledge graphs in different fields,such as financial knowledge graph,medical knowledge graph,etc.,so there are also various methods to construct the knowledge graph.At present,there are many kinds of knowledge graph construction technologies in common natural language.However,the construction of domain knowledge graph is in its infancy,because the construction of domain knowledge graph is often accompanied by the lack of domain data in this industry difficulties such as labeling by experts.Therefore,in the absence of domain data,how to construct domain knowledge graph is an urgent problem to be solved.This thesis focuses on the key technologies of domain knowledge graph construction,such as data processing,knowledge acquisition,knowledge disambiguation and knowledge storage,and focuses on knowledge acquisition,such as domain entity extraction and domain entity relationship extraction.Aiming at the shortcomings of the existing technology,a new method is proposed.The main research contents of this thesis are the following three aspects.(1)Aiming at the problem of lack of domain knowledge,this thesis proposes a method to construct domain ontology based on user-defined rules,and then use these ontologies to expand the domain data of structured and unstructured domain text,and then use the pretraining model to extract domain entities and domain entity relationships based on these domain data.The pretraining model uses domain BERT(Bio BERT),which is conducive to domain knowledge extraction.After setting experiments,Finally,compared with the general domain model,the F1 values of entity extraction and domain entity relationship extraction on the domain dataset are increased by 5.94% and 4.23%respectively.(2)Aiming at the problems of error transmission and entity overlap in work 1,this work does not use the traditional pipeline method,but uses the joint extraction method.An innovative domain knowledge extraction method based on pretraining model and treelstm is proposed.This method rubs the dependency syntax tree into the BERT embedding layer and shares parameters with the embedding layer.In this way,entity recognition and relationship extraction can be carried out at the same time,The relationship extraction task is avoided due to the inaccuracy of entity extraction.The feasibility of the model is verified by setting experiments.On the whole,the F1 value on the common general data set NYT and domain data set NCBI-disease is increased by 7.6%and 4.1% respectively.(3)For work 1 and work 2,this thesis constructs the COVID-19 domain knowledge graph,which improves the quality of the domain knowledge graph.And this thesis uses the constructed COVID-19 domain knowledge graph and the current mainstream front and back-end framework to build a visualization system for testing.The front end can call the back-end data interface to visualize the graph,query the knowledge nodes you want to know,and conduct simple question and answer.
Keywords/Search Tags:Domain Knowledge Graph, Pre-training Language Model, Entity Extraction, Relation Extraction
PDF Full Text Request
Related items