Font Size: a A A

Named Entity Recognition Of Chinese Electronic Medical Record Based On Attention Mechanism And Feature Fusion

Posted on:2023-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiFull Text:PDF
GTID:2544307064470394Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Named entity recognition is one of the downstream tasks of text preprocessing.Its main purpose is to automatically extract entities from text.It is the basis of such tasks as relationship extraction and knowledge map construction.Conventional named entity recognition mainly recognizes the names of people,places and organizations in the text.With the development of smart medicine,named entity recognition has been applied in medical fields such as electronic medical records and biomedicine.At present,the task of Chinese medical record named entity recognition has the problems of long text sequence and strong specialization,and the current entity recognition model input is mostly single character vector,which does not make full use of word information.In order to make neural network capture more comprehensive features and pay more attention to the key points,this dissertation studies a Chinese named entity recognition model based on attention and feature fusion.The specific research contents are as follows:(1)Aiming at the problems of long sequence,strong specialization and lack of word level semantic representation of Chinese electronic medical records,a Chinese electronic named entity recognition model(RoBERTa-wwm-IDCNN-CRF)based on RoBERTawwm is proposed.In order to be more suitable for medical long text tasks and obtain more abundant word meaning features,RoBERTa-wwm is used as a pre training model to dynamically learn vectors with word meaning features.First,the language model is pre trained to obtain vectors as input,and then the IDCNN is used to capture the distance dependence between words of long entities.Finally,the final prediction results are obtained through conditional random fields.Through experimental analysis on CCKS2019 dataset,the F1 value of RoBERTa-wwm-IDCNN-CRF model reached85.53%,which proved that RoBERTa-wwm-IDCNN-CRF model has good generalization and is suitable for Chinese electronic medical record task.(2)Aiming at the single structure of RoBERTa-wwm-IDCNN-CRF feature vector,in order to more effectively cover the focus information of text features,a Chinese electronic medical record named entity recognition model(R-DB-ATT-CRF)based on attention and fusion features is proposed.On the basis of RoBERTa-wwm-IDCNN-CRF model,the BiLSTM network is added to capture global features,combine them with local features learned by IDCNN(DB),obtain features of different scales,and realize feature fusion through dimension splicing.At the same time,the multi head attention mechanism is introduced,and then the weight is calculated according to the fused features to highlight the key features.Finally,CRF is used to decode and obtain the optimal annotation sequence.The experiment on CCKS2019 dataset shows that the F1 value of R-DB-ATTCRF model reaches 86.07%,and the recognition effect of the model in Chinese electronic medical record task is good,which proves that R-DB-ATT-CRF model is effective and suitable for Chinese electronic medical record task.Figure [20] Table [13] Reference [66]...
Keywords/Search Tags:named entity recognition, Chinese electronic medical record, Pre-training model, feature fusion, attention mechanism
PDF Full Text Request
Related items