Font Size: a A A

Research On Concept Extraction In Electronic Medical Records

Posted on:2014-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:B Y DengFull Text:PDF
GTID:2268330422450614Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With increasing investments of government in health information systems,information extraction in EMR (Electronic Medical Record) has drawn more andmore scholars’ attention. Knowledge inhered in EMR can be used in medicaldiagnostics, users’health plan setting, medical question-answering systems and otherareas. As the basis of information processing, concept extraction is indispensable forEMR knowledge mining.Compared with the traditional text, EMR has its own characteristics. EMRcontains lots of professional terminology, medical idioms, diagnosis results made ofnumbers and units, abbreviations and incomplete sentences with strong patterns. AndEMR organizes various parts of the content in a semi-structured way. In addition, asit relates to the privacy of the patient, no large quantities of public medical recordsare available so far. These specialties increase the difficulty of extracting the conceptin electronic medical records. In2010i2b2/VA challenge, F value of the best systemin concept extraction task reaches0.8523. So there’s a gap between conceptextraction in EMR and traditional name entity recognitions.In order to extract relevant concepts in EMR more precisely, this article usedCRF, maximum entropy, MIRA learning algorithms with basic features in conceptextraction task to establish baseline systems. In the experiment with maximumentropy, classification results of tokens performed significantly better than that ofconcept extraction. So the label of the word before was added to the feature vector,which greatly enhanced the effect of concept identification based on maximumentropy.For the characteristics of electronic medical records, the paper made a researchon feature expansion, combined classifiers methods and the use of labeled data inother domain to improve the performance of concept extraction. In the featureexpansion, the characteristics of EMR and health-related resources were taken intoaccount, the F value of concept extracting increased by about one percentage pointfor each feature, moreover, semantic distribution was mined through classifiedmethod and helped increase the F value by2%; In combination classifier methods,bagging method was combined with online learning algorithm. And the integratedclassifier’s predictions with the stacking strategy ware very remarkable, making the system F value reach91.1%. In the use of other domains’ resources, by theinstance-based learning method, different agencies medical data as well asbiological literature data were used for the target EMR concept extraction task. Theresults prove that when the target domain data are in an appropriately small scale,the effect of transfer learning is obvious.
Keywords/Search Tags:EMR, concept extraction, feature expansion, combined classifiers, transfer learning
PDF Full Text Request
Related items