Font Size: a A A

Statistical Model Based Chinese Named Entity Recognition Methods And Its Application To Medical Records

Posted on:2018-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z H XuFull Text:PDF
GTID:2348330518994899Subject:Software engineering
Abstract/Summary:PDF Full Text Request
To perform many tasks in the field of natural language processing,it is necessary to perform an accurate and effective named entity recognition result.The development of research on named entity recognition is often constrained by Natural Language Processing technology,and vice versa.Moreover,the research process of Chinese named entity recognition is much later than that in English,as well as there is no clear word separator in the Chinese grammar structure,so it is more difficult to identify Chinese named entity.In addition,there are a large number of professional lexical and syntactic features in the field of medicine,so that the research threshold of Chinese named entity recognition in this field is raised.In this thesis,the current existing named entity recognition methods are summarized,then more reliable named entity method based on the statistical model is proposed.Furthermore,currently applied to the field of medicine methodsare applied their own method of manually annotated training data,because there is not open united medical corpus.Inspired by Deep Learning fine-tuning method in the model training,we used the fine-tuning method on the statistic model.Based on statistic model,which is trained by news annotating corpus,the fine-tuning method combined with medical professional dictionary is applied.We haveobtained a good performance in the named entity recognition task on the Chinese clinical electrical records.This method effectively reduces the workload that has to annotate for the training model in the early stage of the named entity recognition,andavoids the subjective bias caused by manual tagging training corpus.The experimental results show that the proposedoptimization algorithm is effective for the hidden Markov model and the conditional random field model,and the accuracy is improved by 6.8%and 10.5%respectively,and the recall rate is increased by 8.9%and 11.1%respectively too.Finally,based on the recognition results of 1066 real Chinese clinical records in this work,the combination method of rules and dictionary is applied to extract the key information of medical records.And according to the medical logic rules,the potential information in the key information is analyzed in the above mentioned experimental process,and a set of feasible research methods is summarized and explored.
Keywords/Search Tags:Chinese clinical medical records, Named Entity Recognition, Hidden Markov Model, Conditional Random Fields, Model optimization, Key information extraction
PDF Full Text Request
Related items