Font Size: a A A

Research On Relation Extraction For Chinese Electronic Medical Records

Posted on:2017-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ChengFull Text:PDF
GTID:2348330503987188Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Medical and health services are gradually developing towards information and intelligence, electronic medical records(EMRs) start to play an important role in healthcare industry. EMRs contain detailed records of individual treatment progress and a huge quantity of medical knowledge, it is important to extract and use this knowledge effectively, the main approach to EMRs knowledge mining is information extraction in which relation extraction of EMRs play an important role.This paper mainly focus on research of relation extraction in Chinese electronic medical records(CEMRs). Except for the huge quantity of medical knowledge, CEMRs also contain some protected personal information(PHI) about patients and doctors, in order to protect these personal information, it is necessary to locate PHI and replace it with irrelevant information. An annotation guideline is developed and 100 CEMRs are manually annotated, a CRF model is trained to located PHI in CEMRs and results show that F value of PHI recognition reaches 96.9%.In relation extraction research, 992 de-identified annotated CEMRs are used in this paper. A Feature based relation extraction method is implemented first, some basic features and some unique features in CEMRs are extracted to train a single SVM classifier. According to the result of experiments, some samples of one main relation category are misclassified into other main relation categories. To avoid the misclassification, the singe classifier is partitioned into multiple classifiers, each classifier only deal with samples of specified main relation category. The F value of relation recognition of the modified method reaches 73.4% and the time consuming of training and testing model is reduced.The texts in CEMR is usually similar when they contain entity pairs of same relation category. A tree kernel based method is explored to extract relation in CEMRs from the perspective of text similarity. Samples are transformed into parse trees and a kernel function is computed by counting the amount of same subset trees in two samples, multiple SVM classifiers are trained using this subset tree kernel and experiments show that F value of relation recognition reaches 61.4%. Considering that text features and similarity of text in CEMRs are both important for relation extraction, in this paper, feature spaces of both feature based relation extraction method and tree kernel based relation extraction method are extended in order to combine these two methods into a new method and use this new method to extract relation in CEMRs, a tradeoff parameter is introduced to adjust the new method. Results of experiments show that the method of combining features and tree kernel has the best relation recognition performance, F value reaches 75.9%. However, the method has poor recognition performance on some relation categories and further improvement is still necessary.
Keywords/Search Tags:Chinese electronic medical records, relation extraction, SVM, tree kernel
PDF Full Text Request
Related items