Research On Part-of-speech Tagging For Chinese Electronic Medical Records

Posted on:2015-09-15

Degree:Master

Type:Thesis

Country:China

Candidate:F F Zhao

Full Text:PDF

GTID:2298330422490922

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data,“Smart Healthcare” has become thedevelopment trend of global health care industry. As carriers of medical informatization,electronic medical records (EMR) contain a large number of medical and healthknowledge. Knowledge of electronic medical records can provide services for medicaldiagnosis, management of user health and Medical coordination and other fields.Mining knowledge of EMR is inseparable from natural language processing andinformation extraction technology. Research on CEMR part-of-speech(POS) tagging,which is the foundation of natural processing technology, contribute to the study offollow-up research parsing and information extraction task.Chinese word segmentation and part-of-speech (POS) tagging research on Chineseelectronic medical record (CEMR) is currently at a blank stage because of the lack ofannotated corpus on CEMR. Different from traditional data, CEMR contain a lot ofprofessional terms, acronyms and patterns. Therefore, POS tagging model trained oncommon areas can not be directly used for the POS tagging task of CEMR.In order to better study CEMR POS tagging technology, this paper constructed acorpus of CEMR word segmentation and POS tagging. we propose the scheme fromdata preprocessing to corpus annotation so as to obtain a higher annotation consistency,which is heuristic to build corpus with larger scale and higher quality on CEMR.Furthermore, the statistical lexical differences between CEMR, open-domain corpus andEnglish electronic health record are quantified, and systematic error analysis isperformed on POS tagging model trained on open-domain corpus. These works lay thefoundation for NLP technologies research on CEMR.Based on corpus analysis of CEMR, we propose an appropriate POS tagging modelfor CEMR for the first time. There are two stages: firstly, tag the raw sentencepreliminary, with a character-based joint word segmentation and POS tagging model toavoid error propagation and improve segmentation by utilizing POS information; then,to make good use of the characteristic that CEMR contains some language patterns, wecan revise the preliminary output and improve the accuracy of POS tagging by the ruleslearned from transformation-based error-driven learning method. For the cross domainannotation issue, the POS tagging is effectively improved by adjusting weights of thefeatures which appear only in CEMR. Our system achieves F1-scores of94.75%and93.82%on the test set of artificial annotated CEMR corpus.

Keywords/Search Tags:

EMR, corpus construction, POS tagging, joint model, cross domain annotation

PDF Full Text Request

Related items

1	Research On The Construction Method Of Burmese Part-of-speech Tagging Corpus
2	Building And Evaluating Special Domain Comparable Corpus
3	Research On The Construction Of Uygur, Kazak And Kirgiz Public Opinion Tagging Corpus Based On Crowdsourcing
4	Construction Of Chinese Theme-Rheme Annotation Corpus And Study Of Automatic Analysis Of Chinese Theme-Rheme Structure
5	Research On The Construction Method Of Streaming Document Corpus Oriented To Structure Understanding
6	Research On Active Learning Based Automatic Corpus Annotation
7	Research On Cross-Domain Construction Method Of Domain-Oriented Sentiment Lexicon Based AF Model
8	Categorization Corpus Construction And Research On Classification Method For Short Text
9	Annotation syntaxico-semantique des actants en corpus specialise
10	Research On Cross-domain Object Detection In Remote Sensing Images