Font Size: a A A

Research And Application Of Relation Extraction Between Disease And Symptom From Biomedical Literature

Posted on:2019-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiFull Text:PDF
GTID:2404330563958565Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of information technology,the field of biomedicine has attracted the attention of a large number of scholars,and the number of relevant research results including various academic papers has also increased substantially.According to the statistics of the National Library of Medicine in the United States,the academic papers published in MEDLINE in 2016 in the field of biomedicine have been three times as many as in 2001,and academic papers in the biomedical field are growing exponentially.Obtaining current industry trends from the literature and exploring the knowledge to be excavated are the main directions of many scientific researchers.However,the rapid development of the biomedical field has also increased the difficulty for medical workers to obtain information.They need to read a large amount of medical literature in order to obtain the required information,which is time-consuming and inefficient.Therefore,it is an extremely important research to automatically obtain the knowledge we need from the medical literature.This will greatly improve the efficiency of medical workers and,in turn,promote the biomedical field.In recent years,for the purpose of automatic knowledge acquisition,text mining and text information extraction technology in biomedical literature has been widely used and studied.However,information extraction technology has different technical difficulties in different fields.In the field of biomedical literature,it is mainly reflected in the identification of professional biomedical entities and the extraction of their relation from the text.The research content of this thesis is based on the relation extraction between Disease and Symptom.The difficulty of the relation extraction between Disease and Symptom from other relationships(e.g.,Diseases and Drugs,Diseases and Genes)is primarily the identification of the Symptom.On the one hand,there is no complete vocabulary to describe Symptom entities;on the other hand,Symptom is described by multiple phrases or phrases in many cases,which also increases the difficulty of Symptom identification.In addition,there is lack of ready-made corpus for the relationship between Disease and Symptom.At present,there is no corpus available for training Disease-Symptom relations.To solve this problem,this thesis builds a corpus to make up for the lack of related datasets based on the pattern matching method.The corpus construction process involves the related work of named entity recognition and syntactic analysis,supplemented by manual proofreading.Based on the self-built datasets,the bidirectional LSTM model is used to extract the relationship between disease and symptom entities.At the same time,the traditional neural network models such as convolution neural network(CNN)and Long Short-Term Memory(LSTM)were compared.The experimental results show that the BI-LSTM model performs best,achieving an F score of 84.65% on the constructed corpus.In addition,based on the multi-task learning model,the relevant task corpora are added to the experiment.The experimental results show that the model performance is improved on the corpus of Disease and Symptom.
Keywords/Search Tags:Relationship Extraction, Syntactic Analysis, Pattern Matching, BI-LSTM, Deep Learning
PDF Full Text Request
Related items