Font Size: a A A

Research On Medical Text-oriented Entity Extraction And Concept Normalization Techniques

Posted on:2021-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:C TangFull Text:PDF
GTID:2404330611498191Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the emphasis of national strategy,a large number of electronic medical texts have brought a wealth of original data.Entity extraction and concept normalization techniques for medical texts have important research significance as key steps in knowledge extraction and utilization.The extraction of medical entities mainly relies on named entity recognition technology to identify different types of medical concept entities from the medical text.The entities directly extracted from the text are often not normalized and difficult to be directly used by subsequent tasks,so they need to be normalized according to the context semantics of the entities and this process is called concept normalization.Medical entity extraction and concept normalization are closely combined and interrelated.Based on these two research directions,this paper has launched the following three aspects of research:(1)Entity extraction of Chinese and English medical texts based on deep learning methods.The entity extraction model is built using Long Short Term Memory neural networks(LSTMs),Convolutional Neural Networks(CNNs)for feature extraction,and Conditional Random Field(CRF)for probability calculation.For mining text information,character-level representation learning is introduced to strengthen the model's ability to process medical texts that are not normalized and incorrectly expressed.This paper mainly builds Bi LSTM-CRF(Bi means Bidirectional),Bi LSTM-LSTMs-CRF,Bi LSTMCNNs-CRF deep learning models,combined with different word embedding methods for comparative analysis.The experiment is conducted on the I2B2 2010 data set and 992 Chinese electronic medical record data set.The effect of different representation learning methods on model recognition is analyzed.Finally,the Bi LSTM-LSTMs-CRF model using pre-trained word vectors(glove)in the English corpus has the best performance,and the Micro F1 value can reach 84.70%;while the Bi LSTM-CNNs-CRF model performs best on 992 electronic medical records,and the Micro F1 value can reach 89.13%.(2)Building the Norm CG(Normalization CNN-GRU Model),a medical concept normalization model based on deep learning methods.Norm CG uses the CNN model to extract the morphological features of mentions and candidates,and to extract the semantic features of the correlated mention sequences through the Gated Recurrent Unit(GRU).Combining these two features,Norm CG can give the matching probability between the mention and candidate concept pair,and predict the normalized result of the mention based on the matching probability.The accuracy of Norm CG on the NCBI data set reaches 89.79%,which surpasses many machine learning models such as Norm Co and Tagger One,and experimental analysis shows that effective representation learning can improve the overall effect of neural networks.(3)Integrating traditional rule-based methods and deep learning algorithms to build a concept normalization hybrid model EM-TUGAM(EM-Train UMLS GRU Attention Mechanism Model).EM-TUGAM incrementally matches the mentions and the candidates step by step.By putting the rule-based matching method and the deep learning method to the mention set with their higher accuracy,the model combines the advantages of both methods to improve the accuracy of normalization.The accuracy of EM-TUGAM on the MCN data set exceeds both the rule-based matching method and the deep learning method respectively,and is also higher than the MCN benchmark models,reaching 77.9%.In addition,this paper has researched on the improvement of deep learning models in large-scale pre-training and attention mechanisms.In conclusion,this paper has conducted in-depth research on medical text-oriented entity extraction and concept normalization technology.In terms of entity extraction,this paper uses deep learning models to learn text representations from multiple aspects,thereby improving the recognition of named entities.In terms of concept normalization,this paper studies the designing and building of deep learning models,including morphological and semantical feature extraction,the use of large-scale pre-training models,and the introduction of attention mechanisms.At the same time,this paper builds a deep learning model Norm CG and a concept normalization hybrid model EM-TUGAM,both of which have achieved good experimental results,surpass the existing models on the data set,and provides perspectives for future model improvement through experimental analysis.
Keywords/Search Tags:Entity Extraction, Named Entity Recognition, Clinical Concept Normalization, Deep learning
PDF Full Text Request
Related items