Font Size: a A A

Research On Training Example Selection In Distant Supervision For Relation Extraction

Posted on:2021-04-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y C GuiFull Text:PDF
GTID:1488306557985109Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Distant Supervision for Relation Extraction produces large-scale labeled data automatically under the supervision of existing knowledge bases,thus reduces the reliance on manual annotation.However,the exis-tence of noise in these automatically labeled data may hurt the performance of distant supervision for relation extraction if the data is used for training directly.Training example selection is an important task to address the problem of noise data in distant supervision for relation extraction.It selects training examples with cor-rect labels from the training set to reduce the impact of noise on the performance of relation extraction.There are two types of training example selection approaches,i.e.,implicit and explicit.The implicit training example selection approaches are mainly based on Probabilistic Graphical Model(PGM)and Deep Neural Network(DNN).The former estimates confidence scores of training examples by hidden variables,and examples with high confidence scores are used for training.However,other correct training examples cannot be fully utilized.The latter uses the attention mechanism to adjust the weights of training examples to reduce the impact of noise on the relation extraction model.However,the noise cannot be removed from the training set directly.The explicit training example selection approaches are mainly based on domain knowledge and Rein-forcement Learning(RL).The former utilizes a single type of domain knowledge and cannot comprehensively utilize multiple types of domain knowledge.The latter mainly employs on-policy reinforcement learning al-gorithms,and there is a lack of systematic study of off-policy reinforcement learning algorithms for this task.In order to address the problems of training example selection approaches in distant supervision for relation extraction,we propose the following solutions.(1)With respect to the problems in implicit training example selection approaches,an explicit approach based on Explanation-based Learning(EBL)is proposed for the first time.It employs the Answer Set Pro-gramming(ASP)language to represent domain knowledge and training example selection rules.The EBL algorithm is further improved to learn ASP rule sets using imperfect domain knowledge.This approach can make full use of the correct training examples and remove noise from the training set.The experimental re-sults show that this approach can learn ASP rules for training example selection effectively and achieve an improvement of 30%in recall compared with the baseline using PGM-based method.(2)With respect to the problem of conflicts between multiple types of domain knowledge,an explicit approach based on Markov Logic Network(MLN)is proposed.It includes a novel MLN model to capture relationships between different types of domain knowledge for training example selection.The experimental results show that this approach can select effective domain knowledge for different relations and achieve an improvement of 22%in average F1on New York Times(NYT)data set and of 27%in average F1on Wikipedia data set compared with the baseline using single type domain knowledge.(3)With respect to the problem of lacking of systematic study of off-policy reinforcement learning al-gorithms for training example selection,an explicit approach based on Deep Q-Network(DQN)is proposed.The performances of off-policy reinforcement learning algorithms in the training example selection are stud-ied systematically.In off-policy RL algorithms,a Top-k behavior policy is used for the first time to generate more effective experiences.The experimental results show that this approach can effectively learn training sample selection policies from trial-and-error experiences without domain knowledge and manual annota-tion.In addition,off-policy RL algorithms improve the convergence speed by 6 times without degrading the performances of training example selection compared with on-policy RL algorithms.
Keywords/Search Tags:Distant Supervision for Relation Extraction, Training Example Selection, Explanation-based Learning, Markov Logic Network, Reinforcement Learning
PDF Full Text Request
Related items