| Named entity recognition mainly refers to extracting valuable special entity nouns such as time,place,person,country,organization and event from free text.It is a basic task in natural language processing.Electronic medical record refers to a series of regular professional texts that record patients’ treatment activities in the hospital.Recently,the research on named entity recognition of electronic medical records has experienced dictionary rule stage,machine learning stage and deep learning stage,and the accuracy of entity extraction has gradually improved.However,due to the scarcity of data sets and the sensitivity of medical record data,there is still less research compared with other free texts.At present,there are few results that the research on named entity recognition of medical records can really be used in practical projects.Previous studies focused on how to improve the recognition accuracy of common entities,ignoring the needs of many types of entities and high accuracy in actual production.In order to solve this problem,this paper does the following work.(1)In view of the lack of data set in the mainstream medical record named entity recognition task,a new electronic medical record data set is cooperatively labeled for training and testing.The data set contains many kinds of common entities.The data hides the relevant patient information,and the basic data is provided by a large third class hospital.(2)to improve the accuracy of named entity recognition of medical records,an improved named entity recognition model Ro BERTa-Bi LSTM-CRF is proposed.Two data sets are used to verify the model,and the results are better than the traditional Word2Vec-Bi LSTM-CRF model and BERT-Bi LSTM-CRF model.(3)In order to truly use the research results of named entity recognition in the medical scientific research and production environment,the Ro BERTa-Bi LSTM-CRF model is used to deeply mine the medical text extraction dictionary,and a set of named entity recognition utility tool is designed and developed,which can accurately extract entities from medical records. |