Font Size: a A A

Distant Supervised Relation Extraction Based On Multi-level Noise Reduction

Posted on:2023-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:S Q LiFull Text:PDF
GTID:2558306845499304Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Relation extraction is an important sub-task in information extraction,which aims to predict the semantic relationship between two entities.It has important application value in knowledge graph construction,social network analysis,recommendation search and other fields.In recent years,with the development of deep learning neural network,relation extraction has made new breakthroughs.However,the performance of supervised method is limited by the scale of annotated corpus,so distant supervised method has become a hotspot of current research.Distant supervised is a process of automatic annotation of corpus by aligning knowledge base.Although it can greatly save labor and time cost and quickly obtain large-scale annotation data,it also inevitably introduces wrong annotation.Therefore,one of the major challenges of distant supervised relation extraction is how to reduce the impact of noise on extraction performance.Under the multiple-instance learning framework,although some achievements have been made in the study of sample noise reduction,the following two problems still exist: 1)Current noise reduction methods mostly focus on sentence-level noise,they ignore the noise caused by irrelevant words inside the sentence.2)When training with bags as the unit,sentence-level attention can’t produce effective distribution for single sentence bags and noise bags,which makes the invalid bag feature bring bag-level noise into the training of the model.To solve the above problems,this thesis focuses on the distant supervised relation extraction method based on multi-level noise reduction.The main research contents and innovations are as follows:(1)A distant supervised relation extraction method based on self-attention gated convolution neural network is implemented.To solve the problem of word-level noise,this thesis uses self-attention to capture the correlation information between words,and also incorporates knowledge representation learning to enrich semantic features.The word-level noise is filtered through internal weight allocation.In addition,for the same input sequence,convolution neural network and self-attention mechanism are used for parallel encoding,and context-dependent features learned from self-attention are integrated into the feature representation of sentences by gating mechanism to enrich sentence semantics.(2)A distant supervised relation extraction method incorporating bag-level attention is proposed.To solve the problem of bag-level noise,this thesis introduces a higher level of bag-level attention on the basis of sentence-level attention.By aggregating several bags under the same relation label into groups,the similarity between bags is used to calculate effective distribution for different bags,so as to screen out high-quality bags corresponding to the target relation.In this thesis,experiments are carried out on the public dataset NYT-Freebase.The experimental results show that both methods can effectively improve the performance of distant supervised relation extraction compared with the baseline models.The AUC value of the model reaches 47.1% after the combination of word-level,sentence-level and bag-level denoising strategies.In addition,detailed ablation experiments and case studies verify that the proposed model can effectively suppress the propagation of word-level,sentence-level and bag-level noise.
Keywords/Search Tags:Relation extraction, Distant Supervision, Self-Attention mechanism, Knowledge representation learning, Gated convolution
PDF Full Text Request
Related items