Font Size: a A A

Research On Named Entity Extraction Method For Symptom Phenotype

Posted on:2018-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y H YuanFull Text:PDF
GTID:2348330512992044Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Symptom phenotype(symptoms and signs)is important information in the clinical history and medical literature data and is the main basis for diagnosis and treatment in Chinese and Western medicine.However,the symptoms of phenotypic information in medical data are often based on free textual clinical history(subject to complaints and present history of illness as the main text content)and medical literature data as the main carrier.Therefore,named entity extraction of symptom phenotype is the key step of using symptom phenotype information.In recent years,the named entity extraction for clinical medical records has become a hot research direction,but the main extraction target is disease,drug and clinical problems,while the research on the more complex phenotypic entity extraction is still not sufficient.In view of the importance of symptom phenotype information in TCM(traditional Chinese Medicine)diagnosis and treatment,this paper combines the TCM clinical history(mainly the present history of illness)and PubMed bibliographical literature to carry out the study of the extraction method of symptom phenotype named entity.Through the construction of large scale corpus we conduct researches based on methods such as bootstrapping,classification(conditional random fields and structured support vector machine)and feature learning(word embedding and network embedding).Our work includes the following three aspects:(1)On the basis of manual review and data preprocessing,the paper constructs a corpus containing 1,200 clinical history data of traditional Chinese medicine.On the basis of this,we develop the unsupervised symptom phenotype extraction method based on Bootstrapping and conditional random fields(CRFs).The F1 values are 64.73%and 95.03%respectively,which indicates that the CRFs basically reaches the requirement of the symptom phenotypic named entity extraction from the clinical history in this study.In order to test the completely open performance of the extraction,we constructed the cross-test corpus of different test cases,main complaints and present histories of illness,as well as the first and referral visit data in which CRFs performances attain 82%,58.21%and 81.18%respectively.It provides a reference for the further study on the method of named entity extraction.(2)Based on the introduction of the deep feature representation method(word embedding and network embedding method),combined with structured support vector machine(SSVM)and CRF classification model,a variety of symptom phenotype extraction methods(WENER and GENER methods)are developed integrating non-labeled clinical data.The F1 values of the WENER method are 98.08%(SSVM)and 97.63%(CRF)respectively.The F1 values of the GENER method based on the word feature are 88.42%and 86.01%respectively,while the F1 values of the GENER method based on the phrase feature are 95.04%and 95.00%respectively.(3)Aiming at the problem of phenotypic entity extraction in medical literature,based on PubMed bibliographical literature data we use WENER and GENER methods to conduct experiments.The experimental results show that the F1 values of the WENER method are 93.58%and 93.23%,respectively,while F1 values of the results of the GENER method are 93.57%and 92.04%respectively.The above studies show that the phenotypic entity extraction based on deep representation has a great advantage in the integration and performance of non-labeled corpus,and has certain practical value in Chinese and English named entity extraction.By integrating larger scale non-labeled corpus,it will provide a technical basis for the high performance extraction of various types of medical named entities,thus promoting the construction and development of large-scale medical knowledge maps.
Keywords/Search Tags:symptom phenotype, named entity extraction, present history of illness, PubMed, conditional random fields(CRFs), structured support vector machine(SSVM), word embedding, network embedding
PDF Full Text Request
Related items