Font Size: a A A

A Joint Entity And Relation Extraction Method Of Medical Text Based On Multi-scale Neural Structure

Posted on:2022-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:X A ZhengFull Text:PDF
GTID:2504306752954359Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Entity extraction refers to the identification of entities with specific meaning from text,and relation extraction refers to the determination of the existence of some relationship between entities.The essence of the recently popular joint entity and relation extraction is to train both tasks in the same model.In biomedicine,the extraction of entities and relationships is of great value for disease treatment research.This paper relies on the topic of building a comprehensive medical decision aid platform for public health emergencies based on COVID-19 outbreak.The purpose of this paper is to extract the gene pathways associated with a certain disease or virus from biomedical texts to provide a basis for the treatment plan of the disease.Therefore,this paper conducts research on the extraction of gene-related entities and relationships from biomedical texts.There are several problems with the current joint entity and relation extraction approach.The first point is that word coding is not considered comprehensively.When encoding words,only the words themselves are considered,and the influence of the sentence as a whole on the words is not taken into account.The second point is the problem of entity redundancy.The process of extracting entities first and then relationships will pair all entities for relationship classification,while some entities will not form a relation triple with other entities.The third point is the demand problem of document-level extraction for biomedical texts.To address the above proposed problems,this paper firstly investigates the multi-headed self-attention mechanism and sub-obj entity identification method,and proposes a sentence-level joint entity and relation extraction model that fuses the multi-headed self-attention mechanism and sub-obj entity identification method to improve the accuracy of extracting relation triples in sentence-level text;then,based on the first two points,multi-scale neural structures are investigated for biomedical text characteristics,and a document-level joint entity and relation extraction model that incorporates multi-headed self-attention mechanism and multi-scale neural structures is proposed to improve the accuracy of extracting biomedical entity interactions in document-level biomedical texts.The specific contributions of this paper are as follows.(1)To address the problem of incomplete consideration of word coding,a multi-headed self-attention mechanism is introduced to integrate the valid information of the whole sentence into each word,allowing more accurate word coding.The improved model has a higher f1 score of about 0.8 than the baseline model on both Co NLL04 and Sci ERC datasets,which verifies the effectiveness of the present method.(2)For the entity redundancy problem,the sub-obj entity identification method is used,which splits the original step of extracting entities into two steps of extracting SUBJECT and OBJECT.The f1 score of the improved model achieves about 0.7higher than that of the baseline model on both data sets,which verifies the effectiveness of the present method.(3)Based on the research foundation of the first two points,a neural network model incorporating the multi-headed self-attentive mechanism and the sub-obj entity identification approach is proposed.The f1 scores on the two data sets are 0.5-0.7 and0.4-0.5 higher than the above two models,respectively,verifying that the accuracy of relation triple extraction is further improved.(4)In order to apply the joint entity and relation extraction to biomedical texts,this paper combines the multi-headed self-attentive mechanism and multi-scale neural structure to obtain a joint entity and relation extraction model at the document level,which achieves the conversion from sentence-level extraction to document-level extraction.The f1 score of this model on the CKB CORETM dataset is higher than the baseline model by about 0.5,which validates the effectiveness of the model.
Keywords/Search Tags:joint entity and relation extraction, multi-headed self-attention mechanism, sub-obj entity identification method, multi-scale neural structure
PDF Full Text Request
Related items