Font Size: a A A

Research On Relation Extraction Based On Distant Supervision Labeled Data

Posted on:2020-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q HeFull Text:PDF
GTID:2518305780459114Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Relation extraction is one of the most important technologies in text content understand-ing,which can analyze text from the linguistic level to the content understanding level.Un-der the traditional supervised learning,relation extraction requires a large number of training corpora constructed manually to support specific relationships.However,annotating data manually is a time-consuming and laborious process,and the size and domain coverage of data are greatly limited.As a new data construction paradigm,distant supervision can au-tomatically generate relation extraction training data and greatly alleviate the dependence on manual labeling training data.However,the datasets based on distant supervision are inevitably subject to wrong labeling problem.Although this problem has attracted the atten-tion of many researchers and has been solved to some extent,there are still some problems in the existing distant supervision relation extraction task,such as insufficient feature rep-resentation,inaccurate evaluation of labeled data,incomplete data construction,and hidden dangers of model evaluation methods.To address these problems,the main contents of this thesis are as follows:(1)In view of the insufficient encoding of sentence semantics in neural relation ex-traction,this thesis proposes to learn the syntax-aware entity context representation from dependency tree,and combines the feature information based on word sequence to form a more semantic sentence representation.Firstly,through the syntactic analysis of sentences,this thesis establishes the relationship between entities,and builds three tree-structured neu-ral network models based on dependency tree to capture the context features of entities.Then,we combine the entity context information with word sequence information,and use self-attention mechanism to automatically identify features that are more closely related to entity relationship.The experimental results show that the enhanced feature representation can effectively improve the performance of relation extraction system.(2)In view of the problem that the accuracy of labeled data can not be well evaluated in distant supervision data,this thesis proposes a sentence selector based on reinforcement learning method to automatically select the correctly labeled sentences in the bag for a given relation type.At the same time,in order to maximize the utilization of the potential sentences in the bag that are not accurately identified by the selector,the outputs of the sentence selector are composed of the positive and unlabeled instance bags,and then the relation extraction task is regarded as a positive and unlabeled learning problem.In the process of model learning,we simultaneously semantically represent the positive and unlabeled bags,and further combine these two representations to form a bag semantic representation based on a given relation type,so as to better predict the relationship.Finally,the validity of sentence selector and relation classifier is verified by experiments.(3)In view of the fact that the corpora constructed based on the distant supervision method are not comprehensive enough,this thesis constructs and publishes a dataset for Chinese inter-personal relationship extraction(IPRE).Firstly,we construct the categories of person relationship by Baidu Encyclopedia in Chinese to solve the problem of lacking well-organized knowledge bases in Chinese to provide entity-relation triple information.In order to overcome the problem of incorrect evaluation caused by distant supervision,this thesis proposes to annotate the development set and test set manually.Finally,based on the constructed IPRE corpus,this thesis defines three different types of relation extraction tasks according to the characteristics of distant supervision and multi-instance learning,and de-signs more reasonable and effective evaluation criteria to evaluate the performance of relation extraction models.In order to better support the subsequent research on relation extraction based on IPRE,we provide several benchmark systems and make experimental comparison and results analysis.
Keywords/Search Tags:Relation Extraction, Distant Supervision, Multi-instance Learning, Neural Network, Reinforcement Learning, Corpus
PDF Full Text Request
Related items