Research On Nested Named Entity Recognition For Radar And Combat Systems In Low Resource Scenarios

Posted on:2022-12-12

Degree:Master

Type:Thesis

Country:China

Candidate:L P Hua

Full Text:PDF

GTID:2518306776492424

Subject:Computer Software and Application of Computer

Abstract/Summary:

PDF Full Text Request

In the field of radar and combat systems,extracting radar and weapon entities from text is one of the essential tasks for constructing a Knowledge Graph of electromagnetic radiation sources.Due to the domain specificity,there are a lot of nested named entities in intelligence texts.Nested Named Entity Recognition(NNER)can automatically extract nested named entities of predefined semantic types from text,which can obtain rich information about the entity and inter-entity semantic relationships.Most NNER models assume that there are sufficient training samples for training.However,due to the particularity of the military field and high annotation cost,we can't access enough unlabeled data.Also,it's found through data exploration that there is a tiny publicly annotated dataset and auxiliary data such as a knowledge base or domain dictionary.As a result,NNER for radar and combat systems faces low resource challenges.Given the particularity of radar and combat systems,this paper studies NNER in low-resource scenarios,and the main contributions are as follows:(1)For the problem of insufficient unlabeled data,two data augmentation algorithms are proposed:Single-pass Automated Data Selection algorithm(SADS)and BERT Based label-aware Contextual enhancement Algorithm(BBLCA).SADS learns the distribution characteristics of field data through incremental clustering and then equalizes samples on similar domain data,finally obtaining the new domain data with balanced sample categories.Based on chinese-BERT-WWM pre-training model,the BBLCA algorithm replaces the "segment embedding layer" with the "label layer" in BERT's coding layer input.Then the mask language model is used to mask,insert and delete masks randomly.Finally,the new domain data is obtained.We apply the above two algorithms to datasets RadarCorpus and Radar PatentC or pus,respectively,and get many semantically and syntactically correct unlabeled domain data.The result shows that the unlabeled data obtained by SADS and BBLCA enrich the diversity of training samples and improve the model's performance.(2)For the problem of insufficient label data,and NNER benchmark model based on self-training is proposed-NNER-DMCT.It's designed to generate labels for unlabeled domain data automatically.NNER-DMCT adopts three model frameworks,BERT-CRF,BERT-SPAN and BERT-Tplinker-NNER.Multi-model differentiated collaborative training is carried out based on the proposed BL-Tri-Training algorithm to obtain multiple base learners.Then,the prediction results of the base learner are integrated by a predictive majority voting mechanism to avoid ambiguity errors caused by a single view.The experiment results prove that the NNER-DMCT model can automatically generate high-quality pseudo-label data.(3)For pseudo-label data set obtained by data augmentation method and NNERDMCT model,A boundary-aware Span Representation Neural Model Based on Pseudolabel-BASRN-PL is constructed.It can learn the knowledge of pseudo-label data and consider the influence of noise in pseudo-label data.The model uses dynamic learnable weights to understand task data and correct augmented task data fully.In addition,the data representation is enhanced with an additional BiLSTM model and self-attention mechanism.Compared with the mainstream NNER model,BASRN-PL achieves better results,which shows that BASRN-PL can better learn pseudo-label data sets and achieve better efficiency.

Keywords/Search Tags:

Radar and combat systems, Low resource, Nested named entity recognition, Data augmentation, Pseudo-tagging

PDF Full Text Request

Related items

1	Research On Nested Named Entity Recognition In Geographical Domain Based On Hierarchical Tagging
2	Chinese Nested Named Entity Recognition Research
3	The Method Of Nested Named Entity Recognition In Microblog
4	The Field Of Music, A Combination Of Rules And Statistical Named Entity Recognition
5	Research On Nested Named Entity Recognition Based On Knowledge Embedding And Boundary Enhancement
6	Research On Boundary-based Nested Named Entity Recognition Method
7	Research On Named Entity Recognition And Disambiguation Based On Network Semantic Resource
8	Research And Implementation Of Nested Named Entity Recognition Based On Graph Attention Network
9	Joint Extraction Of Named Entity Recognition And Entity Relationship Based On Neural Network
10	Weakly Supervised Named Entity Recognition Based On Online Encyclopedia