Nested Named Entity Recognition is a crucial technique as it enables the identification of entities that are nested within other entities,thereby offering a more comprehensive understanding of the entity information present in the text.Accurate Nested Named Entity Recognition can provide more accurate and rich semantic information for other natural language processing tasks,thereby improving the accuracy and efficiency of these tasks.Therefore,Nested Named Entity Recognition is an important research direction in the field of natural language processing,which has significant implications for promoting the development and application of natural language processing technology.Current Nested Named Entity Recognition methods typically use shallow features based on character or lexical levels,which cannot fully utilize lexical information and only focus on the use of internal context information while ignoring the use of external information.In addition,existing research mainly focuses on English datasets,with relatively less research on Chinese Nested Named Entities.To improve the accuracy of Nested Named Entity Recognition in the English domain,this paper proposes a method for Nested Named Entity Recognition based on the Wikipedia search engine,which introduces external knowledge to obtain richer semantic information.Considering the characteristics of the Chinese language,we further improve the Wikipedia search engine method and propose a simple and efficient method for Nested Named Entity Recognition assisted by a Wikipedia dictionary,using a Bi-affine structure to obtain a global view of spans and avoid the limitations of specific length enumeration.Our research work is bifurcated into the following two parts:(1)We propose a model based on the Wikipedia search engine for Nested Named Entity Recognition in English.The main part of this model uses a two-stage Nested Named Entity Recognition method based on spans,which combines span boundary regression and fragment classification joint task solution to first locate entity positions and categories,and then perform span filtering and boundary adjustment.In addition,we integrate a local Wikipedia search engine-assisted method with the main model.By retrieving the context of spans from the local Wikipedia search engine,the original input spans are enhanced,capturing better token representations.BERT-CRF is used to obtain tags and confidence scores to assist the filter in performing binary classification tasks and improving recognition accuracy.The effectiveness of the model is comprehensively evaluated on four common English nested named entity recognition datasets,including ACE 2004,ACE 2005,KBP17,and GENIA,through comparative experiments with baseline models and ablation experiments.The comparative experiments demonstrate that SGNNER outperforms the baseline models in identifying nested named entities with significant improvements in accuracy.The ablation experiments show that each component in our proposed method contributes to the model performance to varying degrees.(2)A novel Chinese Nested Named Entity Recognition model,is proposed in this study to tackle the issue of Chinese Nested Named Entity Recognition.The model is based on dictionary assistance,which utilizes the Wikipedia dictionary to match word groups and form character-word pairs that are then integrated into the intermediate layer of BERT,effectively leveraging its representational power.Chinese word groups contain more semantic information than single characters,and the introduction of dictionary information enhances the features,leading to richer semantics.The model uses a bi-affine structure to obtain a global view of the span,avoiding the limitation of specific length enumeration.Additionally,the local interaction between spans is modeled using a Convolutional Neural Network(CNN)to capture the spatial correlation between adjacent spans.Finally,the model’s robustness is enhanced using the R-drop-based contrastive learning approach.The KBCNNER model is optimized for Chinese language characteristics and aims to improve the accuracy and efficiency of Chinese Nested Named Entity Recognition.The model achieves the best results on the People’s Daily,CMe EE Chinese nested datasets,as well as the Weibo and Resume Chinese flat datasets. |