Font Size: a A A

Research On Entity Relation Extraction Method Based On Distant Supervision

Posted on:2020-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2428330623967023Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As one of the main tasks of information extraction,entity relation extraction aims at determining the relation category of two entities in unstructured text,which provides the theoretical foundation for the construction of knowledge graph and intelligent question answering applications.At present,supervised methods are most commonly used for entity relation extraction.Especially,the effect becomes more significant when applying the neural network model of deep learning framework to entity relation extraction task.Although supervised relation extraction achieves high accuracy,it requires a large number of manually tagged corpus.The corpus tagging process is time-consuming,and thus affects the large-scaled extraction task.To address the shortage of manually tagged corpus,distant supervision method has become popular to solve the large-scaled relation extraction task,due to its ability of obtaining a large number of tagged data through heuristic alignment between knowledge base and corpus.However,the distant supervision relation extraction method can easily generate a large number of noise data,which affects the performance of entity relation extraction.In order to solve this problem,this thesis proposes a denoising method based on semantic similarity,and constructs a distant supervision relation extraction model which integrates multi-level attention mechanism.The main research work of this thesis is as follows:(1)Aiming at the problem of the large amount of noise data in traditional distant supervision relation extraction methods,this thesis proposes a noise data filtering method based on semantic similarity.This method determines whether a sentence is correctly labeled by calculating the semantic similarity between the shortest dependency path of entity pairs in the sentence and relational phrases,based on word embedding Jaccard similarity measurement.Then,the filtered tagged data are classified by Piecewise Convolutional Neural Networks(PCNN)relation extraction model.Finally,the experimental results show that the proposed denoising method can significantly improve the effect of entity relation extraction.(2)The high level semantics of the context words in a sentence are usually not fully utilized,and the dependency-inclusion between relations is not considered in the current research of distant supervision relation extraction.This thesis proposes a distant supervision relation extraction model integrating multi-level attention mechanism.In this model,attention mechanism is applied at word-level,sentence-level and relation-level.Word-level attention can fully obtain the high-level semantic information of sentence context.Sentence-level attention can reduce the error in labeling,and the relation-level attention can automatically capture the dependency-inclusion between different relation.The experimental results on public real data sets show that the proposed model can improve the Precision-Recall curve by about 4% compared with the existing mainstream methods,and achieve better effect of relation extraction.
Keywords/Search Tags:entity relation extraction, distant supervision, semantic similarity, word embedding, attention mechanism
PDF Full Text Request
Related items