Font Size: a A A

Document-level Entity Relation Extraction Based On Document Structure And External Knowledge

Posted on:2022-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:T LiFull Text:PDF
GTID:2518306569494824Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the information explosion era,massive amounts of text data are generated on the Internet every day in various forms such as news articles,research publications,and blogs.Many important information is hidden in these documents.Therefore,how to automatically extract structured information from these large amounts of unstructured text through information extraction technology is great significance for natural language processing,understanding and generation.As an information extraction technology,relation extraction focuses on the semantic relationship between two entities,which can be used in downstream tasks such as knowledge graph construction,text summarization,and information retrieval.According to the survey,40% of the entity relationship facts in the Wikipedia data can only be obtained from multiple sentences.Therefore,studying document-level relationship extraction has very important research and application value.At present,most of the research on relationship extraction stays in the simple relation within the sentence,ignoring the situation where there may be multiple relations in a sentence and two entities may have a relation across sentences.In order to solve this problem,a document-level relation extraction model that simultaneously models both intra-sentence relation extraction and inter-sentence relation extraction is designed,and experiments are carried out to verify results.A document graph structure is proposed to solve the problem of cross sentence entity relation extraction.From the document,this dissertation gets entity mention node,sentence node,entity node,section node and document node.According to the hierarchical relationship between nodes,this dissertation constructs the edge between nodes,and generate hierarchical and heterogeneous document graph structure.By introducing a graph neural network with stronger encoding ability for cross sentence information,the document graph structure is modeled.The softened F1 value loss function is used to solve the problem of unbalanced positive and negative relation samples in document relation extraction,so that the model can adaptively learn according to the data distribution of positive and negative relation samples.Experimental results show that the overall F1 values on CDR and CHR datasets are increased by 1.2% and 0.8% respectively after adding soft F-Measure loss function.The constructed document graph structure information can increase the overall F1 values by at least 1.2% and 0.9%.This dissertation proposes to introduce external knowledge into document graph structure by introducing external knowledge as nodes.There are nodes of external knowledge of structures and nodes of external knowledge of descriptions,which are connected to the corresponding entity nodes in the document graph structure.At the same time,it can solve the problems of the lack of knowledge and the poor scalability of knowledge enhancement text representation.The node of external knowledge of structure can get the corresponding representation by embedding the knowledge graph,while the node of external knowledge of description can be expressed by vectorization of the definition text of the entity itself.Experimental results show that the overall F1 values on CDR and CHR datasets are increased by 0.8% and 0.2% after adding external knowledge of structures,and the overall F1 value on CDR dataset is increased by 0.3% after adding external knowledge of description.
Keywords/Search Tags:document-level relation extraction, graph neural network, data imbalance, knowledge enhancement
PDF Full Text Request
Related items