| Named entity recognition is a sub-task of the natural language processing task that aims to mine the key information available from the large amount of structured and unstructured text data to identify the boundaries and types of entities with specific meanings in natural language text.The character embedding model can effectively avoid the problem of cumulative error in Chinese word separation caused by the lack of natural segmentation characters,but Chinese has the uniqueness and complexity of multiple meanings of one character,which will result in insufficient sequence information among the characters carried;in addition,existing methods focus more on how to better extract global dependencies among characters,but local dependencies among characters are also particularly important.These problems may cause the model to learn the under-representation of Chinese features.Therefore,effective extraction of inter-character global dependencies and local dependencies is the key to improve the performance of character embedding models.To address the above issues,this paper adopts a deep learning approach,using an improved Transformer encoding structure and dilated convolution as the encoding of the model to extract global and local features between characters within the text.The model extracts semantic features of Chinese text more richly without introducing word embedding and external knowledge,and improves the performance of the model in the field of Chinese named entity recognition.The details of the work are as follows:(1)To address the phenomenon of multiple meanings of one character in Chinese,this paper proposes a new Transformer encoding structure with a multi-head self-attention mechanism containing dual position encoding.The absolute position encoding gives each character a unique position encoding,while the relative position encoding can effectively extract the direction and distance between characters;this mechanism combines the advantages of both absolute position encoding and relative position encoding,and designs both absolute position encoding and relative position encoding to interact with semantic information,thus enhancing the model’s ability to extract the global context.(2)For the weak local representation in Chinese NER feature extraction,this paper proposes a feed-forward multi-scale dilated convolution fusion mechanism.Considering that a fixed interaction distance may limit the local feature extraction ability,this branch uses multiple convolutional layers with different expansion rates for local feature extraction at different inter-character interaction distances,and feedforward fusion of features at different scales,which improves the model’s extraction ability for local contexts.(3)To address the problem of inadequate feature representation in Chinese NER models,this paper proposed a new Chinese named entity recognition model,namely the dual branch multifeature model(DBM-CNER).The model is designed with a two-branch structure,aiming to improve the ability of the model to capture the long and short distance dependencies of characters in text sequences,and to comprehensively characterize the global and local contextual features of text sequences from multiple perspectives.(4)To address the inadequacy of domain text feature representation in the Chinese clinical electronic medical record named entity recognition task,the proposed model DBM-CNER is used in this paper to perform global and local feature extraction in Chinese clinical data.This work aims to improve the accuracy of semantic information in the extracted electronic medical record text.The model is validated for performance on the dataset of CCKS competition. |