Font Size: a A A

Research On Medical Empirical Knowledge Extraction From Clinical Text

Posted on:2019-12-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:B HeFull Text:PDF
GTID:1364330590972855Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a scarce resource around the world,doctors are under tremendous pressure from a large group of people seeking medical care.Doctors hope to use medical information technology to improve the efficiency of medical practice and improve the quality of medical care in the face of the limited medical resources each patient receives;patients want high-quality medical services to fully control their health.The development of these techniques and services relies heavily on the support of medical knowledge.Therefore,research on methods for automated medical knowledge extraction is imperative.Generally,medical literature,medical books,electronic medical records,etc.are the main sources for medical knowledge.Different from other knowledge sources,electronic medical records are patients' personalized health data,which contain a large amount of empirical knowledge accumulated based on clinical practice.At present,the research on methods for empirical knowledge extraction is still relatively preliminary.As the most knowledgeable data type in electronic medical records,clinical text is also a high-confidence knowledge source that directly reflects the doctor's experience in medical practice.Therefore,it is very necessary to study methods for medical knowledge extraction from clinical text.Clinical text is a specific type of text with its unique sub-language characteristics.Mixed usage of common language and sublanguage in clinical text,and the diverse expressions of medical terminology increase the recognition difficulty of medical entity boundaries.In addition,the long sentence narratives,which are common in clinical text,cause the problem that the context of different entity pairs in a sentence is approximate and some entity pairs have a large distance span,and this problem complicates the medical relation classification task.Based on the textual characteristics of clinical text,this dissertation studies methods for medical empirical knowledge extraction from clinical text.The main research contents include the following five aspects:The first part is corpus construction for medical entities and entity relations in clinical text.The lack of corpus in Chinese clinical text hinders the development of related research.Based on the characteristics of Chinese clinical text,we develop a scheme for medical entities,assertions and relations,and build annotation guidelines for corpus construction.Moreover,an iterative annotation method is proposed to train annotators and to develop annotation guidelines,and a variety of annotation quality assurance measures are adopted to build the corpus.The corpus lays an important data foundation for medical empirical knowledge extraction research on clinical text.The second part is character-based conditional random fields(CRFs)for medical entity recognition.The unique sub-language characteristics of clinical text greatly limit the performance of the open-domain word segmenter,and this causes a lot of error accumulation for subsequent medical entity boundary recognition.Therefore,a word segmenter dedicated to clinical text is constructed for this problem.This clinical word segmenter is used to extract word features for the medical entity recognition model to reduce medical entity boundary errors.Besides,a character-based CRF model is built to identify medical entities,which avoids the error accumulation problem caused by word segmentation.The third part is character-based long short-term memory(LSTM)with a CRF layer(LSTM-CRF)for medical entity recognition.Aiming at the problem that the relevant NLP resources of Chinese clinical text are scarce and the scale of the entity corpus is small,we try to explore the performance of deep learning methods for medical entity recognition.According to the characteristics of Chinese clinical text,we designed several LSTM-CRF models to identify medical entities.We also explore the model performance by using different character and word embedding initializations.The fourth part is multi-pooling convolutional neural networks(CNNs)for medical relation classification.Clinical text is filled with a large number of medical entities,which directly leads to multiple medical entities appearing in the same sentence,and it causes the problem that the context of different entity pairs in this sentence is approximate.The max-pooling operation in traditional CNN can not retain the position information of features relative to entity pairs,so we propose a multi-pooling CNN,which performs a segmented max-pooling operation based on the position of the entity pair in each relation sample,to classify medical entity relations.We also propose a model training method that introduces a category-level constraint to ensure the independence of parameter updates between relationp categories.The fifth part is convolutional gated recurrent units(GRUs)for medical relation classification.There are many long sentences in clinical text,which directly makes some entity pairs have a large distance span.Moreover,the traditional CNN can not capture the dependency information between long-distance features,and recurrent neural networks(RNNs)do not have the accuracy of local features extracted by CNN.Based on this situation,we propose a convolutional GRU model,which unifies the advantages of CNN and RNN for medical relation classification,and compares the effect of the attention mechanism with the traditional max-pooling operation for model performance.In general,according to the characteristics of clinical text,this dissertation studies medical entity recognition and medical relation classification on this text type,and significantly improves the performance of these tasks.This dissertation provides capabilities of extracting medical empirical knowledge for medical services.We expect these methods to be further extended to other data types to further promote the development of medical artificial intelligence.
Keywords/Search Tags:clinical text, corpus construction, named entity recognition, relation classification, neural network
PDF Full Text Request
Related items