Font Size: a A A

Research On Distantly-Supervised Long-Tailed Relation Extraction

Posted on:2022-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:T M LiangFull Text:PDF
GTID:2518306572950959Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As one of the basic tasks in natural language processing,relation extraction aims to identify the semantic relations between entities in the unstructured text.Limited by the cost of manual annotation,it is hard to scale up the applications of conventional supervised relation extraction.By aligning the entity pairs between existing knowledge bases and corpus,distant supervision can automatically generate large-scale annotated data for relation extraction.Therefore,distant supervision has drawn extensive concern of researchers since it was proposed.However,the annotated data collected through distant supervision not only contains a lot of label noise,but also obeys extreme longtailed distribution,i.e.,a small proportion of relations(a.k.a head relation)occupy most of the data,while most relations(a.k.a long-tailed relation)merely have a few training instances.To apply distantly-supervised data to the training of relation extraction model,it is necessary not only to reduce the influence of label noise,but also to ensure the performance of the model on long-tailed relations.Therefore,the problems of label noise and long-tailed relations,have become the two most critical challenges in the research of distantly supervised relation extraction.This paper studies the problems of label noise and long-tailed relations in distantly supervised relation extraction,aiming to train robust and balanced relation extraction models from a large-scale of noisy and long-tailed dataset.The main innovations of this paper are as follows:(1)This paper proposes a novel perspective to address the problem of label noise,which unify the filtering of noisy data and the improvement of neural architecture as a task of magnitude-based pruning.From this perspective,this paper proposes a denosing algorithm for distantly supervised relation extraction on the basis of the lottery ticket hypothesis.The proposed algorithm filters the noisy instances and improves the neural architecture with the iterative magnitude-based pruning.With the theoretical analysis and experimental evaluation,this paper draws an important conclusion: under the paradigm of multi-instance learning,filtering data with the selective attention mechanism is equivalent to pruning the weights of encoders.Hence,the existing general neural network pruning algorithms can be directily applied to the data denosing in distantly supervised relation extraction.(2)To address the problem of long-tailed relations,the constraint graph,a novel relation-dependent struture,is proposed to explicitly model the potential dependencies between relation labels.On the basis of constraint graphs,this paper proposes a novel long-tailed relation extraction framework,which utilizes the neighborhood integration mechanism of graph convolution networks to propagate information among relation nodes,and thus boosts the representation learning of long-tailed relations.To further improve noise immunity of the framework,a constraint-aware attention module is designed to combine semantic information from the context and constraint information from the constraint graph.
Keywords/Search Tags:Distant Supervision, Relation Extraction, Long-tailed Distribution, Multi-Instance Learning, Lottery Ticket Hypothesis
PDF Full Text Request
Related items