Font Size: a A A

Research On Nested Named Entity Recognition For Radar And Combat Systems In Low Resource Scenarios

Posted on:2022-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:L P HuaFull Text:PDF
GTID:2518306776492424Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
In the field of radar and combat systems,extracting radar and weapon entities from text is one of the essential tasks for constructing a Knowledge Graph of electromagnetic radiation sources.Due to the domain specificity,there are a lot of nested named entities in intelligence texts.Nested Named Entity Recognition(NNER)can automatically extract nested named entities of predefined semantic types from text,which can obtain rich information about the entity and inter-entity semantic relationships.Most NNER models assume that there are sufficient training samples for training.However,due to the particularity of the military field and high annotation cost,we can't access enough unlabeled data.Also,it's found through data exploration that there is a tiny publicly annotated dataset and auxiliary data such as a knowledge base or domain dictionary.As a result,NNER for radar and combat systems faces low resource challenges.Given the particularity of radar and combat systems,this paper studies NNER in low-resource scenarios,and the main contributions are as follows:(1)For the problem of insufficient unlabeled data,two data augmentation algorithms are proposed:Single-pass Automated Data Selection algorithm(SADS)and BERT Based label-aware Contextual enhancement Algorithm(BBLCA).SADS learns the distribution characteristics of field data through incremental clustering and then equalizes samples on similar domain data,finally obtaining the new domain data with balanced sample categories.Based on chinese-BERT-WWM pre-training model,the BBLCA algorithm replaces the "segment embedding layer" with the "label layer" in BERT's coding layer input.Then the mask language model is used to mask,insert and delete masks randomly.Finally,the new domain data is obtained.We apply the above two algorithms to datasets RadarCorpus and Radar PatentC or pus,respectively,and get many semantically and syntactically correct unlabeled domain data.The result shows that the unlabeled data obtained by SADS and BBLCA enrich the diversity of training samples and improve the model's performance.(2)For the problem of insufficient label data,and NNER benchmark model based on self-training is proposed-NNER-DMCT.It's designed to generate labels for unlabeled domain data automatically.NNER-DMCT adopts three model frameworks,BERT-CRF,BERT-SPAN and BERT-Tplinker-NNER.Multi-model differentiated collaborative training is carried out based on the proposed BL-Tri-Training algorithm to obtain multiple base learners.Then,the prediction results of the base learner are integrated by a predictive majority voting mechanism to avoid ambiguity errors caused by a single view.The experiment results prove that the NNER-DMCT model can automatically generate high-quality pseudo-label data.(3)For pseudo-label data set obtained by data augmentation method and NNERDMCT model,A boundary-aware Span Representation Neural Model Based on Pseudolabel-BASRN-PL is constructed.It can learn the knowledge of pseudo-label data and consider the influence of noise in pseudo-label data.The model uses dynamic learnable weights to understand task data and correct augmented task data fully.In addition,the data representation is enhanced with an additional BiLSTM model and self-attention mechanism.Compared with the mainstream NNER model,BASRN-PL achieves better results,which shows that BASRN-PL can better learn pseudo-label data sets and achieve better efficiency.
Keywords/Search Tags:Radar and combat systems, Low resource, Nested named entity recognition, Data augmentation, Pseudo-tagging
PDF Full Text Request
Related items