| As a basic sub-direction in Natural Language Processing(NLP),Named Entity Recognition(NER)occupies a key position in other important downstream tasks,including information extraction and dialogue systems.Although the named entity recognition task has been developed for many years and the technology has been continuously updated and improved,there are still certain problems in the Chinese NER task.This is due to the fact that the source of Chinese text data is relatively narrow,the Chinese arrangement is compact and there is no obvious demarcation,and there are often multiple meanings,especially in the professional field,there are corresponding professional specifications,the text structure is more complex,and difficult words are prone to appear.This paper summarizes the research on NER technology and the direction of Chinese NER in recent years,uses the current mainstream methods and improves them,and mainly does the following work:(1)Considering the importance of the interpretation of entity categories to entity boundaries in Chinese NER,this paper proposes an entity recognition model based on machine reading comprehension to enhance prior information(BERT-MRC-GP).The model adopts the machine reading comprehension paradigm to deal with the named entity recognition task,and improves the subsequent prediction of entity recognition position in this paradigm,and uses the global pointer network instead of the original ordinary pointer network based on binary classifier.This paper combines the two for the first time,and retains the characteristics of machine reading comprehension task to introduce entity type prior information in the form of question and answer,and avoid secondary entity classification,at the same time,the global pointer network decoding is introduced to jointly predict the head and tail positions of entity fragments,maintain the consistency of training and prediction periods,reduce the calculation of loss function,simplify the model architecture,and obtain better entity recognition performance,F1 value of 95.46%,80.83%,and 67.41%were reached on the flat entity recognition dataset MSRA,OntoNotes 4.0 and nested entity recognition dataset CMeEE.(2)Considering the lack of understanding of professional field texts in the recognition of Chinese domain entities,and the inability to obtain the characteristics of domain texts to a greater extent,taking Chinese medical texts as data sources,this paper proposes an Chinese medical text entity recognition model based on feature fusion(M2F-MNER),the model focuses on a better representation of Chinese medical text vectors,for professional domain texts,a word vector introduction model integrated into domain dictionary training can greatly enhance the understanding of domain texts,and word vector information is crucial to Chinese entity boundaries.The model obtains the vectors of words,words and glyph features from three aspects,and uses cross-attention to help more deeply integrate the word information associated with the current character into the word vector,and then stitches the Wubi feature vectors,which can help distinguish medical entity categories to a certain extent.Experiments show that the recall,accuracy and F1 value of the model reach 83.95%,82.44%,83.18%respectively on the collated medical data collection set,and the effectiveness of the design of each module of the model is proved by ablation experiments. |