Font Size: a A A

The Study Of Text-Mining Based Biomedical Entity Relation Extraction

Posted on:2019-09-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:W ZhengFull Text:PDF
GTID:1368330545969092Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The vast amount of unstructured biomedical literature contains rich and valuable biomedical knowledge.It is an important source of knowledge for relevant practitioners in the biomedical field.Because of the unique characteristics of biomedical texts,it is urgent to exploit technologies such as text mining to extract and understand knowledge existing in texts.Relation extraction between biomedical entities is the most basic one of information extraction task in the biomedical domain.Its research has the important theoretical and application value for many aspects including the construction of domain databases and knowledge maps as well as the development of related fields of life science and text mining.This thesis focuses on relation extraction between biomedical entities.In this thesis,based on two types of texts including sentence and articles,we analyzed problems existing in the supervised learning methods.Thereby,some researches,such as representation learning of features,the construction of the model and knowledge integration,etc.,are performed to solve existing problems.Most biomedical corpora have not only small size but also complicated long sentences,which leads to the problem of low performances which exist in systems extracting sentence-level relations from these corpora.This thesis proposed an effective graph kernel which makes full use of different types of contexts.Therefore,the proposed graph kernel has the ability to effectively capture the relations among not only close-range tokens but also long-range tokens.Moreover,the relations involve not only the direct but also various indirect contexts.Experimental results show that our approach helps to improve performances of systems which extract relations between entities from literature containing many sentences with long texts and the complicated structure.This approach has the characteristic of the higher precision.In addition,it doesn't have high demand for the size of a corpus.For the corpora with the appropriate size,relation extraction systems which learn automatically semantic representations from sentence-level texts have unsatisfactory performances.To address this problem,we proposed an effective model Att-BLSTM that classifies relations from the literature by combining an candidate-drug-oriented input attention acting on word embedding and a recurrent neural network with long short-term memory(LSTM)units.By introducing input attention,our model highlighs the most influential words for determining the relation type in the long sentences of biomedical texts,which alleviates the bias deficiency of LSTM to some extent.The model Att-BLSTM only depends on three types of input embedding vectors.Experimental analysis indicates that our approach can effectively recognize not only close-range but also long-range patterns among words.Moreover,it improves the overall performances in the DDI task.Furthermore,for extracting relations between concept-level entities within the scope of an article which may span sentence boundaries,most systems use traditional machine learning methods and explore feature engineering,Therefore,we proposed an effective hierarchical document-level neural model according to the characteristics of the texts of relations across sentences as well as the topic of an article.In this approach,candidate entities existing in multiple sentences of an article were masked to make the model purposefully collect value contexts around candidate entities.The model only depends on two types of input embedding vectors.Experimental analysis indicates that our approach can automatically effectively recognize inter-and intra-sentential relations between chemical-disease entities.Moreover,it is a generalized method.Finally,semantic representations of domain knowledge and texts in the relation extraction systems are learned separately in the process of training.For this problem,we proposed an approach integrating domain knowledge with texts to extract relations between entities.This approach depends on the semantic representations of texts and exploits attention mechanism to learn representations of domain knowledge.Experimental results demonstrate that learned domain knowledge depending on the semantic representations of texts has an ability to distinguish the influence of different knowledge on a special pair.Furthermore,the proposed method improves system performances on the task of chemical disease relation extraction,especially,the inter-sentential relation extraction.In summary,according to existing problems under the different conditions based on text granularity,this thesis proposed effective methods and models for extracting relations between biomedical entities.Experimental results achieved the state of the art relevant to systems.
Keywords/Search Tags:Biomedical literature, Relation extraction, Neural network, Domain knowledge, Attention
PDF Full Text Request
Related items