Font Size: a A A

Distant Supervision Based Relation Extraction Combining Clause Identification And Semi-supervised Ensemble Learning

Posted on:2017-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:X K YuFull Text:PDF
GTID:2348330512499483Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Relation extraction is an important component of information extraction.The Distant Supervision based relation extraction(DSRE)is a hot research area nowadays.It can generate substantial training data by Distant Supervision.In this way,it addresses the problem that traditional supervised methods lack labeled data.Distant Supervision supposes that any sentence that contains a pair of entities in a known relation is likely to express that relation in some way.However,this assumption is not appropriate in many cases so that some sentences are labeled wrongly,which may cause poor extraction performance.What's more,the number of negative instances generated by Distant Supervision is much larger than positive instances,and only a small fraction of negative data are used to keep balance between the negative instances and positive instances in the training data.Therefore,the negative data utilization of DSRE is low and it has large potential to be improved.To address the problem that there is much wrongly labeled data in DSRE,a novel noise reduction algorithm-NRCI(Noise Reduction by Clause Identification)is proposed in this paper by exploiting clause identification.NRCI can identify clauses in raw sentences and then judges whether the entity pairs are contained in any clause or not.If not,the instances would be removed from the training data.In this way,NRCI can filter out noisy data effectively.In the experiment,the results show that NRCI promotes the performance of relation extaction.To address the problem of low utilization of negative data,an improved semi-supervised ensemble learning algorithm-ETT is proposed in this paper.It uses training data of Distant Supervision as labeled data and the unused negative data as unlabeled data,and gets better performance.
Keywords/Search Tags:relation extraction, distant supervision, clause identifcation, noise reduction, semi-supervised ensemble learning
PDF Full Text Request
Related items