Font Size: a A A

Research On The Identification And Standardization Of Medical Named Entities From Clinical Real-World Data

Posted on:2024-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:F X FengFull Text:PDF
GTID:2530306938964379Subject:Information Science
Abstract/Summary:PDF Full Text Request
With the rapid development of computers and biotechnology,clinical real-world data resources such as electronic medical records are growing exponentially.These data contain a wealth of knowledge and become a valuable resource for biomedical research.The use of advanced information technology and artificial intelligence technology to assist clinical real-world data mining has become increasingly important.One of the foundational efforts is to accurately and efficiently extract medical entities and link them to a single standardized concept to help process and leverage large-scale text data and facilitate downstream applications such as information extraction,data sharing,and knowledge graph construction.Therefore,this paper takes electronic medical records in clinical real-world data as an example,labels datasets respectively,studies the identification and standardization methods of clinical medical named entities,and applies them to the collection and processing platform of Chinese clinical medical terminology system.Aiming at the problem that the current Chinese clinical medical named entity recognition model does not make full use of Chinese character features,this paper proposes a clinical medical named entity recognition method that integrates Chinese character character shape features into the RoBERTa-wwm pre-training model.Firstly,the RoBERTa-wwm model based on word level is used to fully obtain the text representation of the Chinese character information,and the BiLSTM-CRF layer is entered into the BiLSTM-CRF layer to capture the sequence information after combining with the Chinese character glyph feature,and the sequence relationship between tags is limited to realize named entity recognition.The F1 values of 89.19%and 91.58%in the self-labeling dataset and CCKS2019 dataset were reached,and the comparative experimental results showed that the proposed method could improve the effect of Chinese clinical medicine named entity recognition task.Aiming at the problem that the standardized dataset of Chinese medical named entities has a single entity and the standardization effect needs to be improved,this paper proposes an entity standardization method that combines similarity algorithm and pretrained model,and labels the standard data set covering five types of entities according to the standard vocabulary.The results of model comparison show that the proposed method can be applied to multi-type entity standardization,and the combination effect of Jaccard similarity algorithm and BERT pre-training model is the best.At the same time,this paper also improves this method,and proposes a method to improve the standardization effect of entities based on the similarity between aliases,and the improved method can reach 94.87%F1 value,which is 16.69%higher than before the improvement,which verifies the effectiveness of the proposed method in realizing the standardization task of Chinese electronic medical record entities.Aiming at the needs of Chinese automatic construction of clinical medical terminology system,this paper applies the proposed method in the Chinese clinical medical terminology collection and processing platform,and collects electronic medical record data from clinical real-world data,and realizes automatic medical naming entity recognition and standardization in the platform.
Keywords/Search Tags:Named entity recognition, Entity standardization, Entity mapping, Text matching
PDF Full Text Request
Related items