| Objective: As an important auxiliary tool in the process of hospital diagnosis and treatment,electronic medical record records the whole process of patients from consultation to discharge,which contains a large amount of medical knowledge as the data basis for clinical decision-making,medical knowledge mapping,intelligent consultation and other application scenarios,and the mining,analysis and use of electronic medical record text information plays a fundamental role in supporting intelligent medical care.Therefore,the task of information extraction for Chinese medical record text has become a hot research topic in the field of natural language processing.As an important part of information extraction,relationship extraction has been widely researched in general-purpose fields,especially currently,relationship extraction techniques based on deep learning are widely used in general-purpose fields.Firstly,the structure of Chinese electronic medical record text is quite different from that of ordinary text.Not only there are a large number of specialized terms,but also the concise and efficient way of writing makes a sentence contain multiple entity relationship information,and there is the problem of overlapping entity relationships.Secondly,electronic medical record text usually uses large paragraphs of text to describe a problem,such as the patient’s selfreport,description of the condition and the corresponding examination and treatment plan,compared with sentence-level relationship extraction,electronic medical record text has stronger contextual connections,often the description of an event consists of multiple sentences,and the text composition is document-level.In this paper,we propose a solution to the above problem from the structural characteristics of electronic medical record text.Methods and results: To address the problem of overlapping entity relationships,this paper proposes a cascaded knowledge extraction framework incorporating entity features,Em-CasRel.The framework consists of three main components: an encoder module based on a pre-trained language model,an entity type embedding module,and a cascaded decoder module.Compared with other models that model the relationship extraction task as discrete labels assigned to binary ordered entity pairs,the model proposed in this paper models the task as identifying the objects corresponding to the subjects under a specific relationship,which fundamentally solves the relationship overlap problem while reducing the noise brought by redundant relationships,and further narrows the range of relationship types through entity type embedding to improve the accuracy of relationship extraction.The F1 value is 59.73% on the public dataset CMeIE,which is an improvement of 1.29%.The F value on the manually labeled dataset CaMRE is 71.38%,which exceeds the other models.For the problem of document-level relationship extraction,this paper proposes a Hierarchical and Dual-Channel Graph Convolution Network HD-GCN,which constructs three graph structures based on different levels of semantic relationships,and the model is divided into three parts: a dualchannel encoder module,a hierarchical graph convolutional module,and a relationship classification module.Compared with previous approaches this model makes two improvements,one is to use two parallel encoders to encode the document simultaneously and extract the interactions between entities and contextual textual information respectively,which can both represent the textual features more completely and comprehensively and avoid the gradient problem caused by too many layers of the network compared with the single-channel or stack-based approaches.The second point is to construct graph structures separately for sentences,mentions,and entities in the documents.The F1 value is 55.93% on the manually annotated dataset CaMRE-Doc,which is better than other models.Conclusion: In the Chinese electronic medical record relationship extraction task,this paper designs and proposes the Em-CasRel model for the entity relationship overlap problem and the algorithmic framework for document-level relationship extraction,respectively,and achieves a high accuracy rate,which has good research and application value.This paper not only provides an important technical method for the task of relationship extraction from electronic medical records,but also has some reference and promotion significance for the construction of medical knowledge graph and intelligent diagnosis. |