Font Size: a A A

Research On Named Entity Recognition And Relation Extraction For Medical Texts

Posted on:2021-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:D H YueFull Text:PDF
GTID:2404330602972574Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of medical information technology,unstructured medical text information is becoming more and more abundant.Extracting valuable information from medical texts,such as clinical electronic medical records(EMRs)and medical literature,is an important basis for advancing the research of medical intelligence.Information extraction can analyze,recognize and classify the information in unstructured texts.In general,named entity recognition and relationship extraction are two important directions of information extraction.Many researchers have applied information extraction technology to medical domain.However,there are still several challenges for information extraction.Due to the diversity of medical language expressions,the implicitness and complexity of relationship descriptions and the scarcity of corpus related to medical information extraction,it is difficult to achieve the good effect of named entity recognition and relationship extraction in the medical field.According to different research objects,the main contributions of this thesis can be summarized as follows:(1)Aiming to remedy the limitations of Chinese clinical EMRs data and the weak generalization ability of existing works,a medical named entity recognition network based on cross-domain transfer is proposed,named T-Bi LSTM-CRF.First,the nonmedical domain dataset is used to pre-train the source network,and then the parameters of the target network are fine-tuned through the medical dataset.During the process of source network training,the parameters of the network are preliminarily trained and the target network is initialized.Through effectively guiding by the source network,the target network accelerates the convergence process and improves the learning ability of the model.Experimental results demonstrate that the proposed approach can automatically effectively recognize medical entity and the strict F1 achieves 85.43% on the CCKS 2018 evaluation dataset.(2)To fully exploit the latent semantic relationships between entities in Chinese medical texts,this thesis proposes a novel model BERT-Att-CNN combined with attention mechanism.Firstly,the BERT is utilized to encode the input sequence of medical texts to obtain a high-level representation of language features.Secondly,under the guide of the attention mechanism,CNN extracts useful features selectively.Finally,the label smoothing cross entropy loss function is introduced to optimize the training of the model and remedy the negative effects of imbalanced labels.In addition,in order to address the lack of the Chinese medical relationship datasets,this thesis establishes a Chinese Medical 2019 dataset for medical relationship extraction.The dataset mainly includes manual annotated medical texts such as medical textbooks and clinical paths.Extensive experiments on Chinese relationship extraction tasks show that the model BERT-Att-CNN obtains better performance than the other methods.The F1 value on the SKE 2019 common dataset and self-built Chinese Medical 2019 dataset achieve 77.10% and 48.47%,respectively.
Keywords/Search Tags:Chinese medical text, named entity recognition, relation extraction, crossdomain transfer, BERT
PDF Full Text Request
Related items