Font Size: a A A

Research On Cross-domain Named Entity Recognition Method

Posted on:2022-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:B FengFull Text:PDF
GTID:2518306572960269Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Named entity recognition refers to identifying entities with specific meanings from natural language texts,such as names of people,places,and organizations.With the rapid development of the Internet,the demand for named entity recognition is not limited to the traditional three types of entity types,but has further expanded to the recognition of named entities in various professional fields.On large-scale data,methods based on deep learning can achieve better results in named entity recognition tasks.However,due to the limitation of annotation resources,large-scale annotated data is often not available in the target field,and direct application of deep learning methods cannot achieve better results.Therefore,this paper studies the cross-domain named entity recognition method.Cross-domain named entity recognition refers to the use of labeled data in the source domain and unsupervised data in the target domain to enhance the effect of named entity recognition in the target domain.There are several difficulties: 1)It is necessary to find the common feature representation of entities between domains;2)The pre-training task is not related to the target of the named entity recognition task;3)The model structure of the source domain and target domain is difficult to be completely unified.The research work of this paper is as follows:1.Cross-domain named entity recognition based on sequence labeling.This paper first studies the pre-trained language model,and shows that domain pre-training can effectively improve the effect of named entity recognition.Then,based on the sequence labeling model,it compares the different utilization methods of source domain data for cross-domain named entity recognition.Finally,the research compares the cross-domain named entity recognition method based on parameter generation.2.Cross-domain named entity recognition based on mask keyword pre-training.From the perspective of the relationship between keywords and named entities,this article finds that the two have a high degree of overlap.Since the traditional pre-training process has nothing to do with downstream named entity recognition tasks,this is obviously not conducive to the improvement of the target field.Therefore,This paper uses a domain-independent keyword extraction model to extract keywords from unsupervised data in different target domains,and proposes a method of masking and predicting keywords in the pre-training process.Experimental results prove that mask keyword pre-training can effectively improve the effect of named entity recognition in the target field.3.Cross-domain named entity recognition based on reading comprehension.In this paper,named entity recognition based on reading comprehension is introduced into cross-domain tasks,and the model structure of the source domain and target domain is consistent.The advantages of cross-domain named entity recognition based on reading comprehension are studied,and different problem construction methods are compared,and the problem of decoding conflicts is solved.Due to the limited data size of the target field,the model is prone to overfitting.Therefore,this paper introduces the task of adversarial training into the cross-domain named entity recognition model based on reading comprehension.Compared with the best known results,the method in this paper has an average improvement of 1.91 in the five target areas,and an improvement of 0.61 in the field of biomedicine.It proves the effectiveness of the cross-domain named entity recognition method based on reading comprehension and adversarial training tasks.
Keywords/Search Tags:cross-domain, named entity recognition, mask keyword pre-training, reading comprehension, adversarial training
PDF Full Text Request
Related items