Font Size: a A A

Research On Cross-department Chunking Based On Chinese Electronic Medical Record

Posted on:2017-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:X DaiFull Text:PDF
GTID:2308330503487208Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In 21 st century, "Internet +" has become a hot topic in society, and the "Internet + Health" is the government actively promote the new online health care model, which includes important initiatives to build electronic medical records, a large number of medical information data consequent. It is one of the most important information and electronic medical records(CEMRs) that it contains a wealth of personal medical information of patients, fully learning the health knowledge of CEMRs with the help of natural language processing technology will promote long-term development of wisdom medical. Aiming at our research subject, we have several studies as follows:(1) Referring annotation guideline for PCTB corpus, modify the guideline to make it applies to Chinese EHR chunk corpus annotation rules and establish chunking annotated corpus. In this paper, with reference to PCTB corpus annotation guideline, aiming at the characteristics of Chinese electronic medical records, we propose suitable modifications and complementary norms. On the basis of early work of our team, we get 306 copies contain segmented, part of speech, chunked Chinese electronic medical records with the help of automated annotation and manual annotation, Corpus annotation consistency achieves 98%.(2) Developing the cross-department chunking algorithm based on SCL Chinese electronic medical records. Based on the SCL algorithm, discrete the correlation between the generate variables, the improved algorithm enhances the experimental results, in POS and chunking tasks, F has a value of about 1% improvement.(3) Developing the cross-department chunking algorithm based Tr Ada Boost Chinese electronic medical records. Based Tr Ada Boost algorithm, multi-classification task algorithm applicable, proposed auxiliary expected selection algorithm, using active learning method for screening target sections of the auxiliary tagged corpus, three crossvalidation experiments, F value Tr Ada Boost Algorithm average increase 5% or more, and the auxiliary selection algorithm based on this average increase of about 0.6%.(4) Combining these two algorithms to do double migration study, based on both the original features and examples of data to transfer knowledge, with the introduction of assisted selection algorithm, the final result still has significantly improved than the Baseline result, has a certain practicality.In summary, this paper establish the chunking corpus on the Chinese electronic medical records, using two different transfer learning algorithms for cross-department chunking, two methods are proposed to improve the algorithms and obtain a valid verification experiment, Finally, a combination of two methods for dual transfer learning algorithm and prove availability.
Keywords/Search Tags:Chinese electronic medical records, corpus construction, chunking, transfer learning
PDF Full Text Request
Related items