Font Size: a A A

Denoise Algorithms In Distant Supervisied Relation Extraction

Posted on:2022-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y HuangFull Text:PDF
GTID:2518306752952849Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Entity relation extraction is an essential task in natural language processing.The ultimate goal is to expand the knowledge graph by extracting the relation triplet in a given sentence.The supervised methods depend on expert annotation,which restricts the development of this task.In order to solve the problem of manpower consumption in entity relation extraction of manual annotation,distant supervision method was proposed for large-scale automatic annotation.Under distant supervision,the labeled data could be generated by aligning excited knowledge graph and plain text.However,aligning also brings a lot of noise data,which has a great influence on the training and testing of the relation extraction model.Therefore,the key of distant supervised relation extraction task is how to denoise,which is also the topic of this dissertation.The first work discussed the relationship between the influence function and screening out the noise data in distant supervised datasets.As a robust statistic method,the influence function provides us the impact of each training point for training a model.The definition of one training point's influnce is how the test loss change after removing this point.The definition and calculating of influence function are highly relative to finding the noise data.Based on this,we designed the denoise criterion IFD.Then,we proposed a denoise algorithm in bootstrapping framework based on the IF-D.The experiments verify the criterion IF-D and show the effectiveness and interpretability of the proposed algorithm.The second work comes from the error analysis of the first work.The criterion IF-D might not be hld under the high proportion of noise data.Then,we proposed the new noise criterion IF-C.Also,the IF-C is integrated into the bootstrapping process,and a teacher-student mechanism was applied in the bootstrapping to control parameters updating.Furtherdmore,in order to do quantitative analysis for denoising,we construct a synthetic noise dataset to simulate the various proportion of noise data.The experiment results show that the proposed methods achieve the conpetive result on the public dataset,and the robustness of the algorithm was verified on the synthetic noise dataset.The third work proposes a denoise algorithm via graph neural network.We use the graph neural network to modeling the training instance and the relationship among instances.With the help of graph attention network,we could separate the noise data from clean data by updating the attention weight during training.Furthermore,the attention weight could be used to reduce the impact of noise data in the training step.Experiments show that this method has a good denoise effect.In general,the first two works are instance-level denoise algorithms based on influence function to filter out the noise data before relation extraction training,while the third work reduced the impact of noise instances during training to obtain better relation extraction parameters.Both of them could denoised the distantly supervised dataset.
Keywords/Search Tags:distant supervision, relation extraction, influence function, graph neural networks, bootstrapping
PDF Full Text Request
Related items