Research On Technologies Of Document-level Relation Extraction For Long-distance Entities

Posted on:2024-07-07

Degree:Master

Type:Thesis

Country:China

Candidate:T J Zhu

Full Text:PDF

GTID:2568307100973439

Subject:Cyberspace security

Abstract/Summary:

PDF Full Text Request

Relation extraction is an important technique in information extraction that can extract knowledge from semi-structured and unstructured data.It has been widely applied in downstream tasks of natural language processing such as information retrieval,question answering systems,and dialogue systems.Currently,relation extraction has gradually expanded from the sentence-level to the document-level to obtain more potentially valuable knowledge.However,relation extraction is more difficult due to the large number of entities in documents and the entity pair relationships often span multiple sentences.Existing research mainly focused on evidence sentences extraction,graph modeling,and sequence modeling.Although some studys have been achieved,but still faces challenges such as low evidence recall rates,high modeling complexity,and abundant noise.To address these problems,this thesis studies longdistance entity relationship extraction in documents from three aspects: multi-semantic mention interaction,double-graph modeling,and entity representation enhanced.The main contributions are as follows:1.To address the issue of low evidence sentence recall rate,this thesis proposes a heuristic evidence sentence extraction model based on multi-semantic mention interaction.Evidencebased relation extraction not only effectively predict entity relationships but also has high interpretability.However,entities often appear in various forms in the documents,resulting a low recall rates of evidence sentences that were extracted only from entity perspectives,which makes it difficult to achieve high accuracy in relation extraction.To address this problem,this thesis proposes a new and efficient heuristic model for evidence sentence extraction based on multisemantic mention interaction.The model uses heuristic rules to extract the interaction mentions between head and tail entities based on co-occurrence,adjacency,and semantic characteristics of mentions,and use interaction mentions to predict evidence sentences.Experiments on Doc RED show that our model significantly improves evidence sentence recall rates and outperforms the baseline model Paths-Bi LSTM by 6.01%.2.To address the issues of high complexity entity modeling and low extraction performance in document-level relation extraction,this thesis proposes a double-graph path inference model for document-level relation extraction.Relation extraction based on graph can effectively predict entity interactions by taking advantage of the inference capabilities of graph structures.However,the large number of mentions and entities in the graph makes the structure complex and leads to poor extraction performance.To solve this problem,this thesis proposes a double-graph inference model that constructs a mention graph and an entity graph based on evidence sentences and interaction mention,separating entity modeling and inference to simplify the structure of the graph model,improve the model relation prediction performance.Experiments on Doc RED show that our model improves the inference and prediction capabilities of the model,with F1 score increase of 0.97% and 0.19% on the test and validation sets compared to the single-graph baseline model BERT-MCN+wiki.3.To address the issue of noise information in previous evidence-based methods,this thesis proposes a document-level relation extraction model based on evidence sentence and entity representation enhanced.Although there is a large amount of information in the document that can identify entity relationship facts,but not all information is relevant to entity relationships prediction,and further noise reduction processing of the text is needed.To solve this problem,this thesis proposes a document-level relation extraction model based on evidence sentence and entity representation enhanced.The model uses the interaction mentions and evidence sentences extracted based on the heuristic rules in Chapter 3,remove noise information using evidence sentences,and learns enhanced representations of entities by combining interaction mentions corresponding to head and tail entities.Experiments on Doc RED show that our model has similar accuracy to the models that use the whole document as input text,but the input text is shorter and more interpretable.

Keywords/Search Tags:

Document-level Relation Extraction, Heuristic Rules, Graph Convolutional Network, Pre-trained Language Model, Entity Representation

PDF Full Text Request

Related items

1	Research On Document-level Long Text Relation Extraction Algorithms
2	Research On Document-level Relationship Extraction With Reasoning Information
3	Research On Document-level Entity Recognition And Relation Extraction Method
4	Research On Document-level Relation Extraction With Graph Convolutional Networks
5	Research On Chinese Entity Relation Extraction Based On Schemas And Pre-trained Language Models
6	Research On Key Technologies Of Semantic Relation Extraction In Real Scenarios
7	Document-level Causal Relation Extraction Based On Pretrained Language Models And Graph Convolutional Neural Networks
8	Research On Relation Extraction Based On Graph Convolutions
9	Research On Document Level Relation Extraction Method Based On Graph Convolutional Neural Network
10	Document-level Entity Relation Extraction Based On Document Structure And External Knowledge