Font Size: a A A

Research On Chinese Named Entity Recognition In Medical Field

Posted on:2018-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:T Z XueFull Text:PDF
GTID:2348330533469797Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of the amount of text data and the es tablis hment and popularization of large-scale knowledge base in recent years,the research of named entity recognition has become a hot topic in the field of Natural Language Processing.However,the traditional methods of NamedEntity Recognition are all based on supervised learning,and need large-scale annotated corpora.Therefore,the traditional methods of Named Entity Recognition can not achieve good efforts in medical field where the annotated corpus is so scarce.With the development and popularization of deep learning,RNN(Recurrent Neural Network)model,especially LSTM(Long-Short Term Memory)units are widely used in Natural Language Processing,and achieve remarkable results which is much better than the traditional method in many directions.So we first use the LSTM model to research in the Named Entity Recognition of medical filed.We prove that it can achieve a higher level both in research evaluation and practical application than the traditional Conditional Random Field model as well.Due to the scarcity of annotated corpus in medical field,we want our LSTM model to learn external information which include not only the ling uistic features in the general field but also the unsupervised semantic infor mation of the medical filed at the same time so that it can achieve a better result on the fact that it has do much better than CRF model.In this case,we make use of the pre-training method and transfer leaning in deep learning so that the effect of the model is further improved.Finally,we hope to use another method to perform domain-adaptive Na med Entity Recognition due to the shortcomings of LSTM model in practical applications.We search for the differences and effects of the two language domains by grouping entity identification experiments which the training data is mixed by the annotated corpus in medical field and generic filed.We also use GBDT model to achieve a better practical application effort in the Named Entity Recognition with the integration of unsupervised sementic vector which is trained by unsupervised learning in the medical field and the domain difference.
Keywords/Search Tags:named entity recognition, LSTM, transfer learning, GBDT model, practical application effort
PDF Full Text Request
Related items