Font Size: a A A

Research On Semi-Supervised Semantic Relation Extraction Between Named Entities

Posted on:2009-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:B XiFull Text:PDF
GTID:2178360245963704Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Semantic Relation Extraction (SRE) identifies and classifies the relationship between two entities from text. It plays a critical role in Information Extraction. At present , SRE faces two major challenges: the lack of training data and the distribution imbalance among different semantic relations. Although supervised learning dominates the research in SRE, it requires a large amount of manually labeled relation instances. To overcome this problem, semi-supervised learning has been drawing more and more attention recently.This paper proposes a semi-supervised learning approach on SRE via a bootstrapping procedure. This is done iteratively by first training a classifier using a given labeled data, then applying the trained classifier to label the unlabeled instances and finally adding those classified unlabeled instances with high reliability to the labeled data. In particular, three critical factors in semi-supervised learning are systematically explored: the selection of initial labeled data, the augmentation of the labeled data and the stopping criterion.In selecting the initial labeled data, this paper proposes a statistical stratified strategy to obtain a balanced and representative set. This is done by first clustering all the data into different classes and then selecting proportional representative instances from each class.In augmenting the labeled data, those classified instances with high confidence are selected first and then the same stratified strategy is applied to choose a balanced and representative set to maintain the quality of the labeled data.Finally, an efficient and effective stopping criterion is explored to achieve local convergence. Evaluation on the ACE RDC corpora shows that our stratified strategy can better resolve the imbalance problem and keep the representativeness of the labeled data than other strategies. It also shows that our semi-supervised learning approach much outperforms the state-of-the-art ones.
Keywords/Search Tags:Information Extraction, Semantic Relation Extraction, Semi-supervised Learning, Bootstrapping, Stratified strategy
PDF Full Text Request
Related items