Font Size: a A A

Research On Entity Linking Approaches Based On Active Learning

Posted on:2018-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z P WuFull Text:PDF
GTID:2348330542968907Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Entity linking is the task of determining the identity of entities mentioned in text.Su-pervised learning approaches and unsupervised learning approaches have been widely used in entity linking task in the past decade.However,only a few studies have been reported on ac-celerating model training and improving corpus construction.Active learning can contribute to interactively obtain optimum samples and provide them for annotators to conduct manual anno-tation according to the learning process.Meanwhile,it can reduce the quantity of the training samples as well as keep or improve model performance.This thesis analyzes the characteristics of entity linking task,and use active learning ap-proaches to handle model training and corpus construction task based on active learning.The main contributions of this thesis include two aspects as follow:(1)In consideration of supervised learning model of entity linking,this thesis reduces hu-man annotating effort by using active learning,and proposes two approaches.One is an initial sample selection approach based on popularity,as known as sampling by popularity(SBP).The other is an iterative training sample selection approach based on comprehensive uncer-tainty and popularity,as known as sampling by uncertainty and popularity(SUP).This way ensures representative of initial training sample in the initial sample selection stage and consid-ers both uncertainty and representative of selected samples in the following stage of iterative sample training.(2)To construct entity linking corpus,this thesis proposes an annotating approach based on active learning and unsupervised learning for improving annotation quality.In this way,the most informative samples of unlabeled mentions can be found for annotators to annotate while the precision rate of the whole corpus can be improved by propagating the evidence of labeled mentions.Experiments in this thesis show two main points.One is that approaches of SBP and SUP can effectively accelerate the training process of entity linking model.The other is that approaches of annotation based on active learning and unsupervised learning can effectively improve the accuracy of annotating a silver-standard entity linking corpus on the premise of annotating fewer mentions.
Keywords/Search Tags:Entity Linking, Active Learning, Corpus Construction
PDF Full Text Request
Related items