Font Size: a A A

Relation Extraction Based On Bootstrapped Multi-Level Distant Supervision

Posted on:2020-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y HeFull Text:PDF
GTID:2428330578980910Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Distant supervised relation extraction has been widely used to identify new relational facts from free text.Distant supervision for Relation Extraction(RE)can scale the task to very large corpora with thousands of relations.However,relying on a single-node categorization model to identify relational facts for thousands of relations simultaneously inevitably accompanies with serious false categorization problem due to:(1)instances of closely-related relations can be easily mixed up;(2)the imbalance of training data between different relations may easily cause the categorizer to take instances of relations with less training data as those of relations with large training data.Although some efforts have been made towards the problem,no satisfied improvement was achieved so far.In this thesis,we novelly propose a multi-level distant supervision model for relation extraction,which divides the original categorization task into a number of sub-tasks in multiple levels of a constructed tree-like categorization structure.With the tree-like structure,an unlabelled relation instance would be categorized step by step along a path from the root node to a leaf node.Beyond that,we propose to do bootstrapped distant supervision to update the distant supervision model with new learned relational facts iteratively to further improve the extraction precision and recall.Experimental results conducted on two real data sets prove that our approach outperforms state-of-the-art approaches by reaching more than 10%better extraction quality.In addition,a common problem with iterative bootstrapped approaches is called semantic drift.In this paper,we propose a novel method to minimize semantic drift by identifying Drifting Points(DPs),which are the culprits of introducing semantic drifts.Compared to previous approaches which usually incur substantial loss in recall,DP-based cleaning method can effectively clean a large proportion of semantic drift errors while keeping a high recall.The experimental results show that our DP-based cleaning method enables us to clean around 90%incorrect instances or patterns with about 90%precision,which outperforms the previous approaches we compare with.
Keywords/Search Tags:Information Extraction, Relation Extraction, Data Cleaning, Bootstrapped Extraction, Distant Supervision
PDF Full Text Request
Related items