Font Size: a A A

Cross-lingual Named Entity Recognition In A Low Resource Setting

Posted on:2022-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q X SheFull Text:PDF
GTID:2518306572450784Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named entity recognition is an important subtopic in natural language processing.Named entity recognition requires us to extract meaningful entities from unstructured text and classify them into specified entity categories.The rule-based approach or feature-based engineering approach is often used initially,but these methods require manual design of a large number of rules or manual construction of large-scale features.With the popularity of deep learning technology,based on all kinds of depth of named entity recognition method of neural network model has gradually become the mainstream,and achieved good results,but the method based on deep learning is the key to the success of a large number of labeled data,in the case of less data annotation is often difficult to obtain good results.The main research content of this paper is a low resource oriented cross-language named entity recognition method,which mainly solves the problem that some low resource languages lack data or have no annotated data at all.The main idea of this paper is to transfer the training data in high resource language or the model trained with the training data in high resource language to low resource language.This study is divided into the following three parts,and the results of the three methods are verified with the Co NLL dataset and the dataset provided by Huawei.1.Cross-lingual named entity recognition method based on word vector projection.In this paper,we propose that when there is no tagged training data in the target language,we use the training data in the high resource language to migrate to the low resource language.This approach uses bilingual dictionaries on the source and target languages to align the two semantic Spaces to construct a source-to-target mapping.This paper also proposes a model incorporating attention mechanism to alleviate the problem of word order confusion in translation.2.Cross-lingual named entity recognition method based on pre-trained language model.The advantage of cross-language pre-training model is that it automatically aligns the semantic space of various languages during the pre-training process.In this paper,we propose two methods to improve the effect of cross-language entity recognition based on pre-training.One method is to conduct secondary pre-training using unlabeled data in the same domain from the target language data set and the downstream task data set.One approach is to fine-tune data mixing in multiple languages during multi-source migration or when you have annotated data in multiple languages,so that more information can be learned across languages.3.Cross-lingual named entity recognition method based on teacher-student network.In this paper,we propose a teacher-student network to train the teacher model using annotated data in the source language,and to fit the probability distribution of the output of the teacher model with unsupervised data in the target language.In the case of using multiple teacher models,this paper compares the influence of using different attention mechanisms to weight the teacher model on the model,and combines the training method of mixed fine-tuning obtained in the last chapter with the teacher-student network model to achieve the optimal result.
Keywords/Search Tags:low resource, Cross-lingual named entity recognition, Pretrained language model, Teacher-student network
PDF Full Text Request
Related items