Font Size: a A A

Research On Key Technology Of Relation Extraction Based On Distant Supervision

Posted on:2023-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:X Y SunFull Text:PDF
GTID:2568306782963709Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,massive information emerges on the Internet,usually in the form of unstructured texts.The goal of information extraction is to extract key information from these data and form a structured output for application in fields such as knowledge graph,sentiment analysis,and information retrieval.As a key subtask of information extraction,relationship extraction is mainly used to discover entity relationships expressed by text sentences.The current supervised relation extraction method relies heavily on manually annotated training data,which is time-consuming and labor-intensive in the production process,resulting in this method often only able to complete small-scale tasks.Relation extraction based on distant supervision has received extensive attention from researchers because of its ability to quickly and easily acquire large-scale datasets.However,the way it generates data is too simple and direct,and it is inevitable that noise will be mixed in.This paper mainly focuses on the noise problem generated by distant supervision and the shortcomings of neural network methods to conduct the following research:(1)To solve the noise problem caused by distant supervision,a relation extraction model based on improved attention and label matching is proposed.Firstly,an improved attention module is proposed to capture key information.This module can dynamically adjust the weight of words in combination with relational vectors to provide more salient features for the model,and then obtain better sentence vector representations.Then,through the designed label matching module,high-confidence labels are re-matched for sentences,and part of the noisy data is converted into effective training data,and effective training examples are added while noise reduction,which is helpful for model training.(2)In order to deepen the network’s understanding of sentence semantics and improve the effect of relation extraction,a feature fusion model based on multi-head selfattention mechanism is proposed.In relation extraction tasks,both key words and phrases in sentences can provide effective information for model prediction.This paper uses a multi-head self-attention mechanism to enhance important word and phrase features respectively,which can capture the long-distance dependencies of sentences while obtaining key information.Then,the information of these two parts is integrated by means of feature fusion to obtain a semantically enhanced sentence feature representation,which provides more abundant information for the model and is helpful for model training.Finally,the method of multi-instance learning is adopted to complete the prediction of relations in units of packets.We conducted comparative experiments with multiple models on a public data set.The average P@N indicators of the two methods adopted in this thesis reached 79.5%and 79.7%,respectively,and the precision/recall curve was better than several comparison models.It shows that the methods in this thesis make a substantial contribution to improving the effect of relation extraction.
Keywords/Search Tags:Relation extraction, Distant supervision, Label matching, Multi-head selfattention, Feature fusion
PDF Full Text Request
Related items