Font Size: a A A

Research On Distant Supervision For Relation Extraction Based On Entity Type Information

Posted on:2020-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:J XiaoFull Text:PDF
GTID:2428330575977795Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays,with the era of big data coming,the network generates a huge amount of information every minute and second,of which text information accounts for a very high proportion.How to extract the main information from a large number of heterogeneous text information has always been one of the hot issues on which the modern scholars research.Relation extraction is an important branch of information extraction and a significant way to parse heterogeneous text.It can detect and reveal the semantic relation between entities in text.According to the source of training data,relation extraction can be divided into supervised relation extraction,unsupervised relation extraction,semi-supervised relation extraction and distant supervision relation extraction.Among them,distant supervised relation extraction has attracted much more attention of researchers because it has no domain limitation and is suitable for large-scale data sets.There are three main shortcomings in the existing distant supervised relation extraction methods: Firstly,to obtain the training data set,an assumption will be made when distant supervised relation extraction methods align relation instances in the knowledge base with natural language texts.This assumption is often too strong to hold in reality,introducing a lot of noise into the training data set which will seriously affect the effect of distant supervised relation extraction.Secondly,distant supervised relation extraction requires artificially designed sentence features,which are usually extracted by natural language processing tools.In this process,errors are inevitable and will be passed on all the time,which deteriorates the performance of distant supervised relationship extraction.Thirdly,lacking background knowledge related to entities as supplementary explanations makes existing methods unable to extract more correct relation instances or reduce error predictions,hindering the further improvement of distant supervised relation extraction.To overcome the above three shortcomings,this paper proposes a distant supervised relation extraction model based on entity type information,namely PCNN+ATT+ET.Our model combines the entity type information with the information contained in the sentence and emphasizes the influence of the words between entities on relation extraction,so that more correct relation instances can be extracted and error predictions can be reduced.In this paper,the Word2 Vec technique is used to vectorize the sentences and the entity types.The weighted sum of these two resulting vectors is used as the input of the model.In order to reduce noise,the input is then grouped by the bagging operation based on multi-instance learning.Furthermore,to avoid error propagation caused by natural language processing,piecewise convolution neural network is utilized to automatically learn the features of sentence and entity type information.Finally,attention mechanism which makes full use of all the valid data in the same bag and reduces the impact of noise data is introduced,making our model more intelligent.In order to verify the effect of PCNN+ET+ATT,three groups of controlled experiments are designed.Two evaluation methods are used to compare the new model with several stateof-the-art distant supervised relation extraction models.In these experiments,PCNN + ET + ATT delivers higher precision and recall rate compared with other methods.
Keywords/Search Tags:Distant Supervised Relation Extraction, Neural Network, Entity Type, Attention Mechanism
PDF Full Text Request
Related items