With the increasing popularity of Internet technology and the rapid growth of medical literature,the volume of medical literature data is exploding,but a large amount of medical literature data is mostly stored in a structured manner,which is not easy to extract and expensive to manually annotate.In medical literature,abstracts record important information,and it becomes more and more important to extract important evidence-based medical data based on a large number of medical literature abstracts and analyze them to develop new synthetic drugs to treat diseases.Named entity recognition of medical literature,as a basic and important task of natural language processing,Normative entities can be extracted from the unstructured medical literature,which can be used for the task of building medical knowledge graphs.The development of a collaborative human-computer based medical literature annotation system can be used to extract information from a large amount of medical data in a short time with a small amount of manual annotation to improve efficiency and provide support for downstream data mining.Firstly,we propose the entity recognition algorithm model of BERTBiLSTM-CRF for named entity recognition of medical literature,and finally we develop the "human-computer collaborative medical literature annotation system" for efficient annotation of evidence-based medicine.The work consists of two parts as follows:(1)A BERT-BiLSTM-CRF based Named Entity Recognition Model for Medical Literature.The BERT pre-training algorithm model is an excellent deep learning algorithm model that has emerged in recent years.In this paper,we first take BERT to train the word vector and encode the input language sequence using a bidirectional Transformer to get the representation between the words of the sentence at any two relative positions.The BERT model achieves an F1 of 75.91% in the NCBI-disease standard dataset for the entity recognition task.Adding bidirectional LSTM and CRF layers to the vector after BERT processing After further feature extraction,the F1 value improved by 0.58%.In this paper,two traditional machine learning algorithm models and five deep learning algorithm models are used to do the work on Pub Med evidence-based medical literature entity recognition in biomedical field,and the experimental results are compared and analyzed for comparison.For each algorithmic model 3 times averaging was performed to get the final results.The best performance is found by comparing the experimental results with BERT-BiLSTM-CRF.in NCBI-disease standard dataset F1 was76.49% in Pub Med disease and symptom corpus F1 value was 74.18%.(2)We developed the "Human-Computer Collaborative Medical Literature Annotation System" for rapid structured extraction of evidence-based medical literature.The system combines manual annotation and intelligent annotation,and manages common named entity recognition algorithm models,so that annotators can train and test the performance of the algorithm models after annotation.The system can be used to create standard datasets for training and testing of algorithm models after labeling specific tasks.The human-computer collaborative medical literature annotation system can quickly extract information from a large amount of evidence-based medical data with little manual effort,reducing the cost of manual annotation and improving the efficiency by reducing the human and material resources and time required.This paper presents the process design and functional design of the human-computer collaborative medical literature annotation system and some of the development work I participated in. |