Font Size: a A A

Bootstrapping-based Weakly Supervised Chinese Entity Relation Extraction

Posted on:2011-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:X H LiFull Text:PDF
GTID:2178330332966081Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Semantic relation extraction between named entities is an important research subtask in information extraction. Although supervised learning approaches have achieved certain success in this area, they rely heavily on large scale manually annotated corpora, which are both time-consuming and labor-intensive.This paper proposes a bootstrapping-based approach to weakly supervised Chinese semantic relation extraction. Given a small-scale annotated dataset (the initial seed set) and a large-scale un-annotated dataset, the training dataset can be expanded iteratively from the initial seed set, thus leading to a better extraction result on a small-scale annotated dataset. Furthermore, a clustering-based stratified seed sampling strategy is proposed to select the initial seed set. This is done by first clustering all relation instances into various clusters, then choosing corresponding seeds from each cluster to form an initial seed set, finally from this seed set a weakly supervised Chinese semantic relation extraction system is bootstrapped.Experimental evaluation on the ACE RDC 2005 benchmark corpus shows that, the F-measure of weakly supervised relation extraction for major type relation classification based on this strategy is 63.4, outperforming those on random sampling (57.9) and sequential sampling (52.4) by 5.5 and 11 units respectively. This demonstrates that our method can significantly improve the performance of weakly supervised Chinese semantic relation extraction.
Keywords/Search Tags:Chinese Semantic Relation Extraction, Weakly Supervised Learning, Bootstrapping, Hierarchical Clustering
PDF Full Text Request
Related items