Font Size: a A A

Research On Information Extraction Technology For Chinese Medical Text

Posted on:2020-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2404330623456433Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advancement of medical informationization,the number of Chinese electronic medical records is also increasing.Electronic medical records contain abundant knowledge in the medical field,which is of great significance to the informationization and intelligent development of medical and health services.Electronic medical records contain all kinds of medical information generated in the process of patients’ visits.Effective excavation and utilization of medical knowledge among them is of positive significance for the healthy development of medical undertakings.Different from information extraction in general domain,there are many inherent characteristics of Chinese electronic medical records,which make it difficult to extract information from Chinese electronic medical records.On the basis of fully investigating the information extraction in general domain,the following research is carried out on the task of information extraction in Chinese medical field:Firstly,traditional named entity recognition methods need to construct a large number of features artificially.To solve this problem,this paper designs a medical named entity recognition algorithm based on word tagging.By adding part of speech,dictionary and other features in the representation of word vectors,we enrich the representation of original word vectors,and add CRF layer after the output layer of BiLSTM to learn the dependency between tags.The experimental results on the selfbuilt data sets show that the proposed BiLSTM-CRF method based on word tagging can effectively learn the representation of sentences and improve the effect of medical entity recognition.Secondly,a feature fusion-based method is proposed to deal with the limited tagging data of medical entity relations.First,the lexical and syntactic features of sentences are extracted as basic features,and support vector machine model is used to construct multiple classifiers to predict the relationship categories of medical entities.By analyzing the characteristics of medical texts and based on the basic features,the extended features such as interval information,descriptor and negative word information,and recent syntactic dependent verbs are added to improve the recognition effect of Chinese medical entity relationship.Finally,aiming at the problem that feature-based method can not learn deep semantic information and small-scale tagged corpus is not suitable for deep learning,this paper adopts the method based on distant supervision,builds a large number of training examples automatically from untagged corpus by combining knowledge base with rule constraints,and proposes a CNN based on word attention mechanism.In the process of model training,the importance of different words to relation classification is fully considered.Experiments show that the proposed model can significantly improve the effect of relationship extraction.
Keywords/Search Tags:Electronic Medical Record, Information Extraction, Deep Learning, Distant Supervision Learning
PDF Full Text Request
Related items