| Electronic Medical Record(EMR)is a clinical record generated in the course of medical activities,which contains a large amount of medical knowledge and patient health information.Entity relation extraction on EMRs is to extract all the the entities with independent concepts and the relationships between these entities from complex clinic description,which in purpose of converting unstructured free text into structured medical knowledge.Due to the large number of medical terms and special symbols in EMRs,the description of medical records is different from general texts,which makes the effect of natural language processing technology used in the open field on EMRs texts poor.In view of the lack of Chinese EMR resource,the nested structure of Chinese medical entities,fuzzy boundaries,and the lack of text semantic information in the existing relation extraction methods,the following research has been done on the entity relation extraction task of Chinese EMRs:(1)In response to the lack of the entity relation annotation corpus of Chinese EMRs,we obtained the shared data set released by CCKS(National Conference on Knowledge Graph and Semantic Computing)in 2017,2019,and 2020.Combining the linguistic characteristics of EMRs and the research on the construction of EMRs corpus at home and abroad,a standard for annotating medical entity relations has been formulated.According to the annotation specifications,11 categories of entity relations were annotated,totaling 7,000,forming a Chinese EMR entity relation annotation corpus for semi-supervised learning.(2)Aiming at the problems of entity structural nesting and entity fuzzy boundaries in Chinese EMR texts,an entity recognition method based on joint features and multi-head attention mechanism is proposed.This method uses the joint features composed of characters,parts of speech and dictionary as the text vector representation,uses Bi LSTM and multi-head attention to extract the global and local features of the sentence,and finally uses CRF to combine all the features and complete the prediction of entity annotation.Compared with the current four mainstream models CRF,Bi LSTM-CRF,Bi GRU-CRF and Lattice LSTM,this method has achieved the best results,and the F1-score is as high as 89.16%.(3)Aiming at the problem that traditional entity relation extraction methods based on shallow neural networks have weak feature extraction capabilities and cannot effectively capture the complex semantic information in medical record texts,a hybrid method based on the combination of residual network,gated recurrent unit and attention mechanism is proposed.The neural network relation extraction model,and the use of boostrapping algorithm to improve the training process of the model.The experimental results show that the F1-score of this method on the overall relation categories reaches 89.78%,and the F1-score on the relation categories such as SAP,SNAP,Te AS,Tr AD and Tr AP has reached respectively 93.91%,92.96%,94.74%、93.01% and 95.48%.(4)The Neo4j graph database is used to store all the extracted entities and relationships,and a Chinese medical knowledge graph is constructed.The medical knowledge graph is displayed and analyzed through query sentences,which verifies the correctness of the obtained medical knowledge graph. |