Research On Named Entity Recognition And Relation Extraction For Medical Texts

Posted on:2021-01-13

Degree:Master

Type:Thesis

Country:China

Candidate:D H Yue

Full Text:PDF

GTID:2404330602972574

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of medical information technology,unstructured medical text information is becoming more and more abundant.Extracting valuable information from medical texts,such as clinical electronic medical records(EMRs)and medical literature,is an important basis for advancing the research of medical intelligence.Information extraction can analyze,recognize and classify the information in unstructured texts.In general,named entity recognition and relationship extraction are two important directions of information extraction.Many researchers have applied information extraction technology to medical domain.However,there are still several challenges for information extraction.Due to the diversity of medical language expressions,the implicitness and complexity of relationship descriptions and the scarcity of corpus related to medical information extraction,it is difficult to achieve the good effect of named entity recognition and relationship extraction in the medical field.According to different research objects,the main contributions of this thesis can be summarized as follows:(1)Aiming to remedy the limitations of Chinese clinical EMRs data and the weak generalization ability of existing works,a medical named entity recognition network based on cross-domain transfer is proposed,named T-Bi LSTM-CRF.First,the nonmedical domain dataset is used to pre-train the source network,and then the parameters of the target network are fine-tuned through the medical dataset.During the process of source network training,the parameters of the network are preliminarily trained and the target network is initialized.Through effectively guiding by the source network,the target network accelerates the convergence process and improves the learning ability of the model.Experimental results demonstrate that the proposed approach can automatically effectively recognize medical entity and the strict F1 achieves 85.43% on the CCKS 2018 evaluation dataset.(2)To fully exploit the latent semantic relationships between entities in Chinese medical texts,this thesis proposes a novel model BERT-Att-CNN combined with attention mechanism.Firstly,the BERT is utilized to encode the input sequence of medical texts to obtain a high-level representation of language features.Secondly,under the guide of the attention mechanism,CNN extracts useful features selectively.Finally,the label smoothing cross entropy loss function is introduced to optimize the training of the model and remedy the negative effects of imbalanced labels.In addition,in order to address the lack of the Chinese medical relationship datasets,this thesis establishes a Chinese Medical 2019 dataset for medical relationship extraction.The dataset mainly includes manual annotated medical texts such as medical textbooks and clinical paths.Extensive experiments on Chinese relationship extraction tasks show that the model BERT-Att-CNN obtains better performance than the other methods.The F1 value on the SKE 2019 common dataset and self-built Chinese Medical 2019 dataset achieve 77.10% and 48.47%,respectively.

Keywords/Search Tags:

Chinese medical text, named entity recognition, relation extraction, crossdomain transfer, BERT

PDF Full Text Request

Related items

1	Research On Medical Text Named Entity Recognition And Entity Relation Extraction Based On Machine Reading Comprehension Framework
2	Research On Named Entity Recognition And Entity Relationship Extraction Of Medical Data Text Based On Attention
3	Research On Knowledge Extraction Technology For Chinese Medical Text
4	Study On Named Entity Recognition Model Of Cancer Patient Online Questioning Text Based On Transfer Learning
5	Medical Text Information Extraction Based On Deep Learning
6	Research On Chinese Named Entity Recognition In Medical Field
7	Named Entity Recognition In Medical Field Based On Deep Learning Of Chinese
8	Medical Text Named Entity Recognition Based On Improved Sequence Labeling Model
9	Research On Chinese Electronic Medical Record Named Entity Recognition Based On BERT Embedding And Residual Connection
10	Medical Terminology Discovery And Application Based On Named Entity Recognition