Font Size: a A A

Research On Medical Domain Knowledge Extraction Methods

Posted on:2019-07-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:H D LiFull Text:PDF
GTID:1368330566998639Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Knowledge extraction is the process of analysing,identifying,understanding and associating the knowledge in the source.Entity recognition and entity relation extraction in the natural language text are both important research direction of knowledge extraction.In the general domain,methods for extracting the entities and the their relations from unstructured texts has been widely used to construct Knowledge Graph(KG).The extraction methods usually involve part-of-speech tagging,named entity recognition,text categorisation,and many other natural language processing(NLP)technologies.In the medical domain,although a great deal of medical knowledge has been accumulated in medical records,due to the complexity of knowledge extraction and the strictness of applicable standards,at present,most of the medical knowledge bases are manually built by experts.Therefore,the automatic knowledge extraction methods for the medical domain still needs further study.This paper is aimed to study the medical knowledge extraction methods from the perspective of natural language processing.Our goal is automatically extracting the medical entities and their relations from medical texts.In this paper,the medical knowledge extraction consist of three main steps: 1)extract the entity description from the medical texts,2)map the extracted medical entity descriptions to the standard entities,and3)extract the relationship from the medical texts to establish the relations between the standard entities.These three steps correspond to three natural language processing tasks,i.e.,Entity Recognition,Entity Normalization and Relation Extraction,respectively.The main contents of this paper are given below:First,we research on entity extraction of medical texts.Taking the risk factors of heart disease from the long-term electronic medical record as an example,we extract the medical entity or event such as related diseases,drugs,treatment,genetic history and unhealthy lifestyle in the long-term electronic medical record of heart disease patient and judge its occurrence time.In this paper,the risk factors for heart disease are divided into three types according to different description methods.Based on the practice of sequence labeling,the risk factors of each heart disease are separately constructed and extracted.Finally,the same classifier is used to determine the time relationship.The experiment shows that the proposed method can achieve a 92.68%(micro-average)F1 value on the i2b2 2014 dataset,which verifies the effectiveness of our method.Second,we research on entity normalization of medical texts.For most medical domain entity descriptions,due to the variety of description,it is difficult to mapping the text description entities to the corresponding knowledge bases.In this thesis,we studies the normalization of diseases,symptoms,and drugs in academic literature and electronic medical records.First,we use sieve filters to construct candidates,then a pair of vectors is used to represent entities and entity descriptions,these vectors are learned in the convolutional neural networks(CNNs),and finally the learning to rank method is used to achieve the normalized entity.The experiments show that our method is effective on the NCBI disease dataset,Bio Creative 2016 CDR dataset and Sh ARe / CLEF electronic medical record dataset.Third,we research on entity relation extraction in medical texts.There are multiple instances of an entity described in the medical text,and the entity relation description exists in both inner and inter sentence.That is different from most of case in the general field,in which entities are usually extracted from a single sentence.In this thesis,a supervised learning model based on the piecewise CNNs is adapted to extract the entity relation within inner-and inter-sentence uniformly.In addition,we use attention mechanism and domain expertise to assist the extraction without manual features engineering,such as syntax struct.Experiments are conducted on a public dataset to validate the effectiveness of this method.Based on the supervised CNN-basd model,a multi-instance weaklysupervised model based on the recurrent neural networks(RNNs)is proposed for the multi-instances problem,which overcomes the shortcomings of only processing single instance with one label in supervised model.Experiments results on the public dataset demonstrate that multi-instance weakly-supervised methods can achieve higher entity extraction performance than supervised methods.Fourth,we research on combination of pharmaceuticals based on medical KG.For the sloving of drug combination problem,we introduce a medical treatment oriented knowledge framework of medical KG,which include the top-down KG building method that extracts knowledge from internet resources,the bottom-up drug ingredients and interactions extraction method,and the drug combination reasoning based on KG.Since the mention of the head entity of the drug interaction relation is always omitted in the drug instructions,the traditional binary relation extraction method cannot be directly adopted.To address this issue,a RNNs based method for extracting the relations from head entity missing texts in this kinds of relation extraction is proposed,which avoids the requirement that the traditional binary relation method needs to determine the positions of two entities.Our experimet results verify the effetiveness of the proposed method.
Keywords/Search Tags:Medical Knowledge Extraction, Entity Recognition, Entity Normalization, Entity Relation Extraction, Multiple-Instance Learning, Pharmaceuticals Combination Relation
PDF Full Text Request
Related items