Font Size: a A A

Enhancing Named Entity Recognition With Data Intervention

Posted on:2022-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:X J ZengFull Text:PDF
GTID:2518306551453454Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Named entity recognition is the upstream task of many tasks in natural language processing.It lays the foundation for tasks such as information retrieval,intelligent dialogue,and reading comprehension.Although the progress of deep learning models on the task of entity recognition in recent years has shown that this task does not seem to be difficult,the reliance of deep learning models on a large amount of labeled data has led to named entity recognition tasks that usually require a large amount of labeled data to be able to achieve good results.This kind of flaw will become a fatal flaw in many scenarios.For example,in the medical field,it is difficult for us to obtain large-scale annotation data due to privacy protection.In response to the above problems,this thesis has done the following from the perspective of data intervention:1.This thesis proposes a counterfactual example generator from a causal perspective.This method can decouple,reorganize,and discriminate entities and contexts in a small num-ber of observational examples to automatically generate many counterfactual examples,which can be used as follow-up example—training of named entity recognition model.2.This thesis abstracts the existing gazetteer-enhanced training process of named entity recognition models.This thesis develops a highly decoupled open-source framework based on which users can perform quick experiments or add new datasets,gazetteers,and models.3.Based on the above framework,this thesis conducts a series of empirical analyses on the existing gazetteer-enhanced models.This thesis firstly discusses whether the gazetteer can further improve the performance of the pretrained language model.Then,this thesis explores the relationship between the characteristics of the gazetteer and model perfor-mance.This paper proves through experiments that the proposed counterfactual generator can sig-nificantly improve the performance of entity recognition with a small amount of annotated data and can increase the F1 value by more than 5% on average when there are only hundreds of an-notated data.Besides,empirical analyses on a series of named entity recognition models based on gazetteer enhancement show that gazetteers are particularly useful in most cases,especially when the benchmark performance is poor.This thesis also finds that a suitable gazetteer should contain an excellent pretrained lexicon embedding and as many entities that appear in the train-ing set and the test set at the same time.
Keywords/Search Tags:Natural Language Processing, Named Entity Recognition, Causality, Toolkit, Gazetteer
PDF Full Text Request
Related items