Font Size: a A A

Research On Biomedical Relation Extraction For Multi-granularity Text

Posted on:2021-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:S QianFull Text:PDF
GTID:2428330626960358Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Biomedical relation extraction aims to explore the relationships between entities such asgenes,diseases,proteins and drugs in biomedical literature,which is the key link of biomedical information extraction and laying the foundation for the construction and maintenance of the knowledge base in the biomedical field.According to the different granularity of the text in which the entity is located,this thesis analyzes the biomedical relation extraction from the perspective of document level and sentence level,and focuses on two typical tasks: Drug-Drug interaction extraction and Chemical Disease Relation extraction.Drug-Drug interaction?DDI?extraction task is to extract drug-drug interaction between entities from the same sentence.Compared with the traditional relationship extraction task,sentences of this task have long length,many modifiers,complicated structure,which contain a lot of redundant information increasing the difficulty of entity relationship extraction.In this thesis,Drug-Drug interaction extraction model based on the Attention mechanism incorporating dependency information is proposed.The attention mechanism can fuse semantic information of the sentence and syntactic information of the shortest dependency path.The correlation between the shortest dependency path and the sentence is measured to capture the useful information in the sentence from the perspective of syntactic structure.Our model combining with SCIBert pre-trained language model to encode the word vector is evaluated by DDIExtraction 2013 corpus.The experimental results show that our system achieves a micro F1-score of 81.76%,which is the state-of-the-art model.Chemical Disease Relation Extraction task aims to extract the chemical-induced diseases relationships between entities in the article.Compared with intra-sentence relation extraction,its characteristics are as follows: 1.There are many inter-sentence relationships in the task of document-level relation extraction.2.The same entity may appear more than once and exist many different representations in the document,which are called the mentions of this entity.However,most existing systems are not good at learning the semantics of distant context,and the method of fusing multiple entity mentions may result in information loss.In this thesis,we present a novel model based on multi-attention mechanism to learn the global representation for document level relation extraction.The Global Context-aware Attention?GCA?is proposed to obtain the global semantics in a document,and simultaneously the Global Entity-aware Attention?GEA?is employed to fusing the information of all mention pairs in our model.Our model is validated on the Bio Cre At Iv E V Chemical Disease Relation?CDR?dataset which achieves a F1 score of 60.1%.The F1-scores of intra-sentence and inter-sentence relation extraction are 65.5% and 42.9%,respectively.In conclusion,according to the characteristics of relation extraction tasks with different text granularity,this thesis proposes the model of Attention mechanism incorporating dependency information for sentence-level relation extraction and the model of learning global semantic representation throughout the entire document for the document-level relation extraction.Experiments on DDIExtraction2013 and BioCreAtIvE V CDR datasets are to verify the validity of the proposed models.
Keywords/Search Tags:Drug-Drug Interaction Extraction, Chemical-Disease Relation Extraction, Dependency Information, Global Semantic Representations, Attention Mechanism
PDF Full Text Request
Related items