| Modern information technologies such as artificial intelligence and the Internet of Things have been applied in the field of healthcare,which has rapidly promoted the development of smart healthcare.Among the diverse medical information,the accuracy of named entity recognition of Chinese electronic medical records is closely related to the effectiveness of medical knowledge graph.High-quality medical knowledge graph is the basis for realizing functions such as intelligent clinical assistance and intelligent online consultation.Due to the Chinese electronic medical record data has the characteristics of unstructured,mixed Chinese and English,diversity of expressions,and colloquial expression,resulting in a complex task of named entity recognition.Therefore,it is a meaningful topic to study high-performance Chinese electronic medical record entity recognition algorithm and its system application,which will help the development of smart medical care.This paper mainly conducts in-depth research on the difficulties of entity boundary determination,entity recognition with special length and corpus crowdsourcing labeling under the condition of Chinese-English mixture.The main contents of this paper include as follows:(1)A named entity recognition algorithm based on the fusion feature of four-corner number word vector is proposed.Compared with other Chinese character coding methods such as radical feature,quadrangle number feature can fully represent the two-dimensional structure of Chinese characters and has low coding repeated rate.In this algorithm,each character of medical record text is mapped to a one-hot coding vector of four corner numbers,which is spliced with BERT character vector features,and the bidirectional long short-term memory network and conditional random field method are used to predict named entity labels.Experimental results show that the accuracy of entity recognition of Chinese electronic medical records has been greatly improved by adding the quadrangle number feature of Chinese characters.The F1-score value reaches 87.17 % on CCKS2019 corpus,outperforming the radical feature algorithm by 2.6%.(2)A named entity recognition algorithm for Chinese electronic medical records based on model fusion is proposed.The algorithm integrates multiple BERT and XLNET models of different structures with different weights for named entity recognition.BERT model has different semantic extraction capabilities at different network layers,and XLNET model has advantages in semantic extraction of super-long text The results show that the proposed multi-model fusion algorithm can reduce the recognition error rate of super-short or superlong entities,and the F1-score reaches 89.27%.The performance of drug entities and disease diagnosis entities is greatly improved,which is 12.96 % and 7.99 % higher than that of single model.(3)In this paper,we design and implement a crowdsourcing system for recognizing named entities to solve the difficulty of corpus collection.The system applies the model in this paper to automatically identify named entities in electronic medical records,and realizes the functions of crowdsourcing medical records,automatic model labeling and administrator review.It can effectively expand the scale of electronic medical record corpus and improve the reliability of crowdsourced labeling named entities. |