| With the rapid development of computer technology and Internet technology,data information in the network is growing exponentially.Artificial intelligence(AI)technology has become a hot research topic in order to effectively alleviate the burden of manual data processing.Natural language processing(NLP)technology is a key technology for efficient data acquisition and an important research direction in the field of AI.Named entity recognition(NER),as a fundamental task in NLP,not only has high research value but also plays an important role in many downstream tasks such as information retrieval,machine translation,and knowledge graph construction.The NER algorithm has strong language specificity.Compared with most Latin-based languages,Chinese sentences do not have clear word boundary information,which poses great challenges to Chinese NER technology.How to improve the accuracy of entity recognition of models has become one of the main difficulties.This paper proposes improvements and optimizations to the Chinese NER algorithm from four aspects: Chinese character features,word segmentation technology,attention model,and pre-trained language model.The purpose is to further improve the recognition performance of the model.The innovative work of this paper are as folllows:(1)This paper proposes a multi-embedding algorithm model,named MNSR(Multi-embedding Network based on Soft Lexcion and Radical-feature,MNSR),which is based on dictionaries and radical features to address the issue of word segmentation errors caused by ambiguous word boundaries and insufficient semantic information in Chinese sentences.The advantages of this model are that it focuses on Chinese character glyph and vocabulary information and enhances semantic information through a fusion of radical-level embedding method and Soft Lexicon method to reduce segmentation errors.It also strengthens the ability to extract potential word features and alleviates gradient disappearance problems during training by gated convolutional networks.Experimental results show that the MNSR model has a good performance in Chinese named entity recognition tasks,with F1 values of 76.40%,64.30%,and 96.16% on the public datasets Onto Notes4.0,Weibo,and Resume,respectively.Compared with traditional models and the latest models related to literature,the MNSR model has different degrees of improvement in recognition performance.(2)In this paper,an SECL-MNSR model(Incorporating SE and Cross-Lattice Attention into MNSR,SECL-MNSR)is proposed to address the problems of information loss caused by different channel importance in the process of convolutional pooling and insufficient fine-grained correlation features between multiple data embeddings.This is achieved by introducing SE attention mechanism and Cross-Lattice attention mechanism.The SE module is used in the feature extraction network of the radical-level embedding model,utilizing two fully connected layers to train weight parameters and explicitly model the interdependence relationship between channels.The Cross-Lattice module operates on different input representations to capture fine-grained correlation features on various feature spaces.The experimental results indicate that the introduction of two attention mechanism modules effectively enhances the model’s ability to capture Chinese entity information.Compared with the MNSR model,the SECL-MNSR model improves the F1 values of Onto Notes4.0,Weibo and Resume datasets by 0.83%,0.87% and 0.28% respectively,further optimizing the accuracy of entity noun recognition.(3)Regarding the weakness of SECL-MNSR model in extracting contextual information from text and the insufficiency of training data,this paper introducing the BERT pre-trained network and proposes the B-SECL-MNSR model(Incorporating BERT into SECL-MNSR,B-SECL-MNSR).The model uses BERT to capture long-distance semantic dependencies and incorporates the results into character representation information to enhance data availability.The experimental results show that compared with the literature-related models,B-SECL-MNSR has better recognition performance,with F1 values of 83.38%,72.21% and 96.69% on Onto Notes4.0,Weibo and Resume datasets respectively.Compared with the SECL-MNSR model,the B-SECL-MNSR model significantly improves the recognition performance within the range allowed by inference efficiency reduction,with F1 values increased by 6.15%,0.25% and 7.04% respectively. |