Font Size: a A A

Research On Distant Supervision Relation Extraction Technology With Pre-trained Language Models

Posted on:2022-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:S C NieFull Text:PDF
GTID:2518306569981979Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As an important subtask of information extraction technology,relation extraction provides necessary technical support for many fields of natural language processing,and has important research significance and application value.Traditional methods usually use manual annotation to obtain entity relationship annotation data sets.However,this method is cumbersome and expensive,only a small number of samples can be annotated,and it is difficult to learn effective entity relationship features.For this reason,the distant supervision method proposes to use an external knowledge base as the supervision source,and automatically annotate a large number of sentences through the existing data in the knowledge base.By this method,the distant supervision method solves the problem of the small data set on supervised model,and has become a research hotspot in relation extraction.However,there are a lot of noisy data in the data set generated by automatic annotation,which has a large negative impact on the accuracy of the relationship extraction task.Therefore,the current research focus is mainly on how to reduce the noisy data in the data set.In response to the above problems,this paper proposes a two-way GRU model with pre-trained language models and a multi-level attention mechanism ERNIE-BERT-HA-Bi GRU(EBHB),which is mainly composed of two parts:(1)Through data analysis,it is found that false positive(FP)noise data is a key factor affecting the performance of the model.Therefore,a data set reconstruction noise reduction strategy for distant supervised model based on the pre-trained language model ERNIE is proposed,and the false positive noise data in the sentence packet can be effectively removed by reconstructing the data set.(2)In order to obtain deeper semantic information,it is proposed to use the pre-trained language model BERT to embed external knowledge on the training set.Subsequently,the knowledge is embedded into the bi-directional gated recurrent unit(Bi-GRU)network to capture the sequence features present in the sentence.Furthermore,a multi-level attention mechanism(HA)is added to Bi-GRU for highlight the part of the data that represents the entity relationship both at the word and sentence level.This paper conducts comparison and ablation experiments on the NYT+Freebase distant supervision and annotation data set.Comparison experimental results show that the EBHB model is superior to most mainstream models in various evaluation indicators.And ablation experimental results verify that the data set reconstruction noise reduction strategy and the prior knowledge embedding strategy can effectively reduce the noise interference and improve the performance of EBHB.In addition,the characteristics and effectiveness of the strategy are visually demonstrated through case analysis of data set reconstruction and visual representation of attention.
Keywords/Search Tags:distant supervision, relation extraction, data set reconstruction, multi-level attention
PDF Full Text Request
Related items