Font Size: a A A

Research On Document-level Relation Extraction Algorithm For Biomedical Domain

Posted on:2021-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:J K FengFull Text:PDF
GTID:2404330626960374Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a popular frontier interdisciplinary subject,the biomedical area covers expertise in many disciplines including biology,life sciences,medicine,and computer science,and research in this area has therefore received extensive attention from researchers.In recent years,with the rapid development of research in the biomedical area,the number of biomedical literatures has increased exponentially,resulting in researchers often having to read a large number of articles when obtaining the needed information.Therefore,applying text mining technologies to extract valuable biomedical knowledges from unstructured biomedical articles is of great significance for research in the biomedical area.Relation extraction is one of the key tasks of biomedical information extraction.The current mainstream relationship extraction methods are generally based on the sentence-level,which mainly focus on extracting the relationship between two entities in the same sentence.In the document-level relation extraction task,the two entities are no longer in the same sentence,but span multiple sentences and represent the overall concept-level relationship,which is more difficult than sentence-level relation extraction.Based on this,we propose a method based on multi-instance learning to extract the biomedical entity-relationship at the document-level.This method constructs multiple relationship samples for each candidate entity-pair,which effectively alleviates the problem of the single-instance method that may damage the performance of the model due to the introduction of noise in the single-example relationship.Compared with the existing methods,our method achieves the state-of-the-art performance.In addition,the scale of manual labeling data in the biomedical area is usually small,which will lead to inadequate model training and impair the relation extraction performance of the system.To solve this problem,we propose a relation extraction method based on distant supervision.In our method,we align an existing knowledge base with the biomedical articles through distant supervision,and obtain a large amount of labeled data,and then we expand the whole training set using these data and enhances the learning ability of the model.At the same time,we also make a preliminary exploration on the fusion method of domain knowledge.By fusing domain knowledge information with text semantic information,we further improve the performance of the relation extraction.Finally,language model pre-training methods have achieved state-of-the-art performance on many natural language processing tasks,which has attracted wide attention from researchers.In order to explore the performance of the pre-trained language model on the relation extraction task,based on an existing relation extraction model,we ensemble a pre-trained language model.First,the language model is pre-trained through large-scale unmarked biomedical data,then the text representation of the language model is added to the model as a feature representation,and finally the relationship prediction result is obtained.The experimental results show that the relation extraction method that incorporates the pre-trained language model achieves obvious performance improvement.
Keywords/Search Tags:Relation extraction, Multi-instance learning, Distant supervision, Pretrained language model
PDF Full Text Request
Related items