| Since the third technological revolution,Internet technology has developed rapidly,and we had entered the era of big data.In the medical field,electronic medical records,as the core of medical information construction,have gradually replaced paper medical records,and are a key resource in medical research.How to efficiently use massive electronic medical record resources to provide intelligent services to people is one of the major topics towards smart medical.Information extraction is the most important and basic step in electronic medical record analysis.Extracting important information from electronic medical records through information extraction technology and presenting it in a structured form not only brings convenience to medical research of medical staff,but also provides a basis for higher-level information utilization.Named entity recognition and relations extraction are significant contents of information extraction.In the extraction of medical information,pipeline-based methods are usually used to identify "entities/attributes" and extract "entity-attribute" relations.The pipeline-based approach,on one hand,leads to the errors in the "entities/attributes" recognition part being passed to "entity-attributes" relations extraction part,on the other hand,does not consider the relevance of the two tasks.In recent years,some joint learning methods have been proposed,however they have not utilized the rich language knowledge in the field.In response to the above problems,this paper proposes a joint extraction method of entities and relations based on context awareness,and comprehensively evaluates the method on the public English corpus of Sem Eval-2015 Task 14 and the Chinese electronic medical record dataset from a top three hospital.Our research includes the following three aspects:(1)For medical entities and attributes recognition tasks,on the basis of Bi-LSTM-CRF,language models were introduced to achieve context awareness of word granularity: We compared three language models which are LM,ELMo,and BERT.Experimental results show that language models can bring performance improvements.Among them,BERT has the best effect.On the English corpus,the F1 value has increased by 3.47%;on the Chinese corpus,the F1 value has increased by 1.43%.(2)For the extraction of medical entity relations,on the basis of Bi-Seq LSTM,the Attention mechanism is introduced to realize the context awareness of entity granularity: the Attention mechanism is used to realize the interaction between the target "entity-attribute" pair and the context "entity-attribute" pairs,and then capture the interaction between "entity-attribute" pairs.The experimental results show that on the English corpus,the entity-aware attention mechanism can give Bi-Seq LSTM a 1.57% F1 value increase;on the Chinese corpus,there is a 1.04% F1 value increase.(3)Combining the above two points,a multi-granular context-aware entity and relations extraction method is constructed.This method shares the sentence presentation layer and realizes multi-granular context awareness through the integrated effect of joint learning.The experimental results show that the multi-granular context-aware entity and relations extraction method is superior to the pipeline extraction method and the single-grain-aware joint extraction method in the tasks of entity and attribute recognition and entity relations extraction. |