| With the rapid advancement of the digitization of the medical system,a large number of Chinese electronic medical records have emerged.In particular,under the background of the increasingly aging population,the global pandemic of the new crown epidemic,and the shortage of medical resources,how to fully exploit the unstructured knowledge in Chinese electronic medical records,providing more efficient services for doctors and patients,has become a research hotspot for information extraction tasks.Named entity recognition and relationship extraction are the key sub-tasks of information extraction technology.which are respectively used to identify named entities from unstructured text,and extract the semantic relationship between entity pairs,providing support for medical knowledge graph construction,personalized recommendation system,clinical assistant decision-making and other applications.This thesis studies entity recognition and entity relation joint extraction for Chinese electronic medical record texts.By analyzing the data characteristics of Chinese electronic medical record text and the shortcomings of the existing methods,the entity recognition and relationship joint extraction method suitable for medical entities and their relationships are proposed respectively.The main work includes the following two aspects:(1)Aiming at the characteristics of multiple terms and serious entity nesting electronic medical records in Chinese,an entity recognition method combining multifeature embedding and attention mechanism is proposed.The method fuses the multigranularity semantic features of characters,words and glyphs in the input representation layer,and introduces the attention mechanism into the bidirectional long-term and short-term memory neural network,so that the feature extraction of the encoding layer pays more attention to the characters related to medical entities.Decoding through the conditional random field,to achieve the optimal joint annotation of six types of medical entities such as diseases and diagnosis,anatomical parts,and drugs in the Chinese electronic medical record text.The proposed method is applied to CCKS2019 and CMD-NER data sets,and the experimental results show that the proposed method outperforms the comparison methods.(2)Considering the characteristics of chinese medical records,such as many short sentences,long distance weak relation and serious relation overlap,a joint entity relation extraction method based on relation discovery word and graph convolutional neural network is proposed.This method is based on the relational discovery word set,and through the word attention mechanism,assigns different weights to the contextual descriptors of the relation to achieve semantic enhancement.Based on the text embedded representation obtained from the pre-trained language model,a feature fusion gating mechanism was designed to realize the organic fusion of the output features of the bidirectional short and long duration memory network and the feature of relation discovery words.The weighted dependency graph pruned by the attention mechanism is encoded by the graph convolutional neural network,and the local features of the grammatical structure of the sentence are obtained.The proposed method is applied to the CMe IE and CMRT-RE datasets,and compared with a variety of mainstream methods,the experiments verify the effectiveness of the proposed method in the joint extraction of medical entity relationships. |