Font Size: a A A

Research On Long Tail Problem And Cross-Sentence Relationship In Document-Level Relation Extraction

Posted on:2024-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:2568307067493094Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Screening the entities of interest and their relationships from the massive Internet information texts is a current research in the field of natural language processing information extraction,and an important part of building knowledge graphs.Previously,entity relationship extraction mostly focused on sentence level,but in practical scenarios such as news and company announcements,entity relationship extraction is more at the document level.In the task of document-level entity relationship extraction,documents are long and there are many types of entity relationships,so the long-tail problem is very common.In addition,many entity relationships exist in multiple sentences,and the cross-sentence problem affects the extraction performance of the model.To address the long-tail problem,the first work in this dissertation proposes a model for entity relation generation that combines prompt templates and decoders in a sequence-to-sequence architecture.By directly generating semantic labels and then classifying entity relationships,the model avoids the mapping process of semantic labels to category labels and can decode multiple types of relationships among entities with only one encoding.Experiments on three document-level datasets demonstrate that the method achieves the best performance in the small-sample type metric Macro F1,and also achieves a very competitive performance in the overall performance Micro F1 metric.Compared to single sentences,document-level text contains longer contexts,more entities,and more complex entity interactions.The second work in this dissertation proposes a cross-sentence entity relationship extraction model based on entity interactions and coreference resolution.Specifically,the work uses an entity interaction module based on an attention mechanism instead of using graph neural networks to model interactions between entities,thus avoiding the problem of information loss caused by predefined edge-building rules.At the same time,the work uses external NLP tools to annotate the indicative pronouns in the document and uses an attention mechanism to incorporate this referential information into the entity representation.Experiments show that the model is faster and more effective than the best current graph-based approaches.Unlike the previous two works that predict the relationships between entities given labeled entities,considering that entities are not always available in realistic scenarios,the third work in this dissertation proposes a serialization approach with shared relationship names,implements an end-to-end document-level entity relationship triad generation model,and alleviates the category imbalance problem in document-level triplets extraction by a two-stage training strategy.Experiments on the document-level dataset validate the effectiveness of the approach,achieving the best performance so far on two evaluation metrics.
Keywords/Search Tags:Long-tail Problem, Entity Relation Extraction, Prompt Learning, Document-Level Text
PDF Full Text Request
Related items