| With the development of domestic medical informatization,more and more medical institutions have begun to use electronic medical records to store medical information,which has produced a large number of electronic medical records.Electronic medical record named entity recognition can identify important medical information such as patient symptoms,disease names,examination measures,etc.stored in the electronic medical record,can be used to establish an intelligent electronic medical record system,assist medical staff in diagnosis and decision-making,and promote the information construction in the field of medical care,it has attracted the attention of scholars in recent years.At present,the Chinese electronic medical records named entity recognition has the problems of imbalance in entity categories and sparse features.For the imbalance of entity categories,this paper first proposes a Chinese electronic medical record named entity recognition model based on synonymous oversampling(CSOT-BiLSTM-CRF).Based on this model,taking into account the sparseness of the entity feature of the medical record text,another Chinese electronic medical record named entity recognition model based on multi-feature fusion is also proposed(Fusion_input-BiGRU-CRF).The main research work of this paper includes the following two parts:(1)Research on named entity recognition of Chinese electronic medical records based on CSOT-BiLSTM-CRF.First,by combining the extended version of synonyms dictionary,a word similarity calculation method is improved,and the Combine Synonyms Over-sampling Technique oversampling algorithm is proposed according to this method;Second,use the CSOT algorithm to oversample the minority classes to construct a class-balanced data set;Third,the vectorized data is extracted through the Bidirectional Long Short-term Memory neural network to extract contextual features,in order to capture the long-distance dependence of the text;Finally,the probability matrix after the feature extraction is input into the Conditional Random Field to automatically learn the constraints hidden in the labeling sequence,and decode to obtain the final labeling sequence to complete the named entity recognition task.Experiments show that the CSOT-BiLSTM-CRF model improves the F1 value of the two minority classes by 7.80%and 8.95%,on the basis of completing the task of named entity recognition.(2)Research on named entity recognition of Chinese electronic medical records based on Fusion_input-BiGRU-CRF.On the basis of model(1),considering the sparse entity feature in the electronic medical record text,a Fusion_input-BiGRU-CRF recognition model is proposed.First,extract the word boundary,part of speech and dependency analysis features of the entities in the category balance data set,and map each feature into a vector form;Second,assign different weights to each feature vector through the vector dimension,and connect to generate multi-feature fusion vectors to enhance the semantic expression of entities;Third,in order to obtain faster computing power and less running time,input the multi-feature fusion vector into the Bidirectional Gated Recurrent Units network to automatically extract text features;Finally,the entity tag is predicted by the conditional random field to realize the named entity recognition.Experiments show that the F1 value of the Fusion input-BiGRU-CRF model in the CCKS2017 data set reaches 89.27%,which has achieved good recognition results.Figure[19]Table[14]Reference[64]... |