Font Size: a A A

Research On Automatic Assignment And Prediction Of Disease Codes Based On Electronic Medical Records

Posted on:2023-01-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YuFull Text:PDF
GTID:1524307310462784Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The assignment of International Classification of Diseases(ICD)is the first step for comparable statistics of global disease information across regions and time.It is a basic necessary work for the analysis and application of electronic medical records(EMRs).Automatic assignment of ICD code based on massive EMRs data can help coders to complete coding more efficiently and accurately.However,the lengthy texts of clinical notes,the large number of disease codes,and the complexity of classification taxonomy and coding rules bring great challenges to the task.According to the characteristics of clinical text,the hierarchical structure of ICD classification system,and the recognition of medical entity concepts,this thesis explores automatic assignment and prediction of disease codes by using deep learning methods.The main research innovations of this thesis are as follows:(1)An automated ICD coding method for Chinese EMRs based on character enhanced representation is proposed.Considering that the natural language understanding of Chinese text is usually based on word segmentation,which ignores the semantic information of single Chinese character.An enhanced word embedding representation method combining the semantic features of single Chinese character is proposed in this thesis,which uses a two-layer bidirectional LSTM network with context attention mechanism to extract the features of clinical text.The experimental results on the Xiangya Chinese clinical dataset and the public English dataset MIMIC-Ⅲ show that the character enhanced word embedding representation can capture the text features effectively and improve the prediction performance of ICD coding.It achieves the best performance in the metrics of micro average F1 and Hamming loss.Due to the characteristics of Chinese language,the improvements on the Chinese dataset is more obvious.(2)An automated ICD coding method based on the fusion of semantic sections features is proposed.Considering the input of automated coding model,when we take the whole clinical text as the input of a deep neural network model,on the one hand,the information may be lost when the text is truncated for a lengthy text,and on the other hand,the correlation between different information sections and the disease coding is ignored.In this thesis,multiple semantic sections of the discharge summary are extracted as the inputs,instead of the original texts.Combining the label attention mechanism,the label-specific features of each section is captured.The disease codes are predicted by fusing the features of each semantic sections.The experimental results on MIMIC-Ⅲ dataset show that different semantic sections have different contributions to coding.The model integrating semantic sections features can make full use of text information and improve the performance of disease code assignment.(3)An automated ICD coding method based on multi granularity label prediction is proposed.Taking labels of automated ICD coding into account,ICD taxonomy is a hierarchical classification system,which makes automated ICD coding is different from common multi-label classification tasks.In order to take advantage of the inheritance nature of the hierarchical structure of ICD classification,a multi task model for ICD coding is proposed,which can predict disease coding on multiple label granularity.The method is based on the semantic feature representation by using bidirectional LSTM network capture.Multiple label-specific features of different granularity are learned based on the split of semantic feature.A feature reinforcement strategy is employed to integrate fine granularity features into coarse granularity label classification.Experiments on four experimental datasets verify the effectiveness of the method.The improvement in the macro average F1 score shows that the prediction performance of rare labels is improved.(4)A prediction method for tentative primary diagnosis based on the fusion of medical concept features is proposed.The accuracy of primary diagnosis code is one of the important factors that affect the quality of ICD coding.Based on the recognition and feature representation of symptom concepts,a tentative primary diagnosis code prediction method is explored by combining text semantic features and medical concept features.On the one hand,the bidirectional LSTM network combined with the context attention mechanism is used to capture the context semantic features of the text;on the other hand,Meta Map is used to extract the symptom concepts from the text,and then TF-IDF algorithm is used to vectorize each symptom concept with disease related weights.the primary diagnosis code is predicted based on two fusion strategy of semantic features and symptom concept features.The experimental results on the public dataset MIMIC-Ⅲ show that the fusion strategy based on different features can bring better accuracy in the primary diagnosis prediction task,and the recognition and feature representation of the symptom concept provide a useful supplement to the semantic features of the text.
Keywords/Search Tags:Electronic medical records, Assignment of disease code, ICD, Automatic assignment, LSTM, Attention mechanism
PDF Full Text Request
Related items