Font Size: a A A

Research On Chinese Medical Named Entity Recognition

Posted on:2022-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:S H ZhongFull Text:PDF
GTID:2504306752954369Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The main work of Named Entity Recognition(NER)is to extract specific entities,such as the name of a person or place,from unstructured texts.In recent years,the development of computer technology has brought precious annotated electronic medical data,which enables us to use deep learning related technologies for information extraction,laying a foundation for the construction of intelligent medical treatment and knowledge mapping in the future.Compared with the open domain NER,people have done less work on the NER in the medical field.Medical Named Entity Recognition(MNER)has its specific difficulties,such as entities with strong professional knowledge,some categories of entities are too long,including complex entities mixed in Chinese and English,etc.There has been considerable research work on NER in open domain.However,MNER still needs to “suit the remedy to the case”.To solve these problems,the main work of this paper are as follows:· Methods based on external knowledge enhancement In view of the strong professional knowledge and abundant external resources in the medical field,this paper explores how to utilize these external resources and integrate them into the model.In this paper,a medical dictionary is constructed and two external knowledge acquisition methods are proposed: feature template method and segmentation tagging method.Context text features are extracted by feature template method and entity location and label features are extracted by word segmentation method,which are integrated into the Long Short-Term Memory network.Experimental results show that introducing external knowledge effectively increase the accuracy of entity recognition.· Model improvement based on Self-attention mechanism Aiming at the problem of long entity in medical field,entity fracture and boundary error are easy to occur in the prediction.In this paper,Self-attention mechanism is introduced into the model,that is,the correlation between characters is introduced to make the cohesion of characters in the long entity higher,and alleviate the long entity prediction fracture and prediction boundary error.The experimental results show that the overall recognition effect of the model is improved,and the prediction accuracy of long entities is also significantly increased.· NER based on cascade hierarchy recognition In order to further improve the effect of the model,a cascade hierarchy recognition model was proposed,which changed the original single task NER into a multi-task model.One task divides entity boundaries and the other predicts entity categories.At the same time,combined with the Self-attention mechanism,this paper proposed two methods to integrate external knowledge into the model: embedding layer fusion method and layered fusion method.Subsequent experiments on CCKS data set show that the recognition effect is improved.The results show that the layered fusion method has better performance,and the F1-Score is improved by 3.3% compared with Bil STM-CRF model.NER is regarded as a multi-task cascade hierarchy recognition model to better integrate external knowledge.
Keywords/Search Tags:Named Entity Recognition, External Knowledge, Vocabulary Enhancement, Self-attention Mechanism, Electronic Medical Records
PDF Full Text Request
Related items