Font Size: a A A

Research On Named Entity Recognition For Chinese Electronic Medical Records

Posted on:2016-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:C Y QuFull Text:PDF
GTID:2308330479490074Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rapid development of information technology(IT) has contributed to the hospital information construction, and the support of national policy lays the foundation of hospital information system(HIS) such as electronic medical record system(EMRS). So that there is a large amount of data from the medical community, in which electronic medical records(EMRs) get a lot of attention.EMRs is an important clinical information resource generated in the process of medical activities, which contains a wealth of medical knowledge closely related to the health of the patient. The mining of knowledge in EMRs will promote the development of medical treatment. According to the development of EMRs, the main research work is carried out as follows:(1) Developing the annotation guideline for Chinese electronic medical records(CEMRs) and establishing annotated corpora for named entities of CEMRs. By reference to the definitions of named entity type and modification type of electronic medical records given by the US Informatics for Integrating Biology and the Bedside(I2B2) in 2010, an annotation guideline for CEMRs was developed under the guidance of professional doctors; Based on the analysis of a large number of CEMRs, a complete scheme for annotation of CEMRs’ named entities was proposed, and an annotated corpora within 992 CEMRs for named entities of CEMRs was established by using the methods of pre-annotating and formal annotating. Its annotation consistency is over 92%.(2) Developing the named entity recognition(NER) based on supervised learning of CEMRs. In the paper, the model of maximum entropy(ME)、conditional random fields(CRF)、structural support vector machine(SSVM) are used to build the NER system. In the meantime, the features of CMERs、dictionary and word cluster are introduced. In view of the lack of medical dictionaries and knowledge in Chinese, the paper has developed a small-scale CEMRs dictionary to assist NER. The paper constructs the word vector based on 3734 CMERs, and compares the performance of K-means and GAAC. With extended features, SSVM model performs better than other models, and the F value reaches 92.87%.(3) Developing NER based on combined classifiers. The paper constructs multiple combined classifiers to enhance the effect of NER with Bagging and Stacking. The performance of combined classifiers with CRF and SSVM based on Stacking is the best, the F value of which is 92.97%.In summary, the paper has developed the annotation guideline for CEMRs and established annotated corpora for named entities of CEMRs. And a NER system based on three kinds of supervised learning method is implemented. The extended features and classifiers combination algorithm has improved the performance of the system. Compared with other teams, the system developed in the paper performs better in the definition of entity, the scale of corpora and the performance.
Keywords/Search Tags:CEMRs, NER, annotation guideline, classifiers combination
PDF Full Text Request
Related items