| The huge amount of literature and high quality content in biomedical databases have become an important resource for research in the biomedical field.It is of great research significance for biomedical text mining to extract the required information quickly and accurately from the large amount of data literature and to discover the potential connections between the information.Biomedical named entity identification is a fundamental and important task in biomedical text mining.Accurate identification of entities in the literature is a key step for subsequent tasks such as information extraction,knowledge Q&A,and knowledge graph construction.Compared with the named entity recognition tasks in other fields,biomedical named entity recognition has problems such as mixed alphanumeric special characters,multiple meanings of words,and nested entities,which make the named entity recognition tasks in biomedical field face great challenges.The current biomedical named entity recognition research mainly focuses on genes,diseases,chemicals and species,etc.With the gradual emphasis on animal-derived disease research,the extraction of information about proteins,bacteria,viruses and phenomena has become a hot spot in the biomedical research field,however,not all named entity recognition datasets containing these types are available in common biomedical datasets.The current deep learning-based named entity recognition methods mainly use sequence model for entity recognition,which has achieved good research results in other fields,however,the existence of nested entities in biomedical field limits the effectiveness of this model in named entity recognition in biomedical field.In this thesis,we propose a deep learning-based named entity recognition method for the special problem of named entity recognition in biomedical field,and design and implement a biomedical named entity recognition system,the main research work is as follows:(1)A named entity recognition dataset for animal-derived diseases is constructed.The dataset focuses on the types of entities of interest in animal-derived disease research,and four types of entities,namely bacteria,viruses,proteins and phenomena,are manually labeled based on the automatic acquisition of genes,diseases,chemicals and species using the Pub Tator tool.The dataset was annotated semi-automatically from the literature database Pub Med,where more than 20,000 documents were collected.The obtained dataset contains a total of 6067 data samples and 20999 entities,and the establishment of this dataset lays the foundation for the study of named entity identification of animal-derived diseases.(2)A biomedical named entity recognition method based on coordinate convolutional network and dual affine model is proposed.To address the problems of nested entities and obscure entity features in the entity recognition process,the model uses Bio BERT to obtain the contextual content representation of text,and incorporates the coordinate convolution model based on the bisimulation model and finally realizes entity and entity type recognition.The F1 value of the proposed method is 79.01% on the GENIA dataset,which is 1.01%~2.78% higher than other models,and the F1 value is 63.19% on the animal-derived disease named entity recognition dataset constructed in this thesis,which is4.87% higher than the baseline model,and the experimental results all show the effectiveness and reliability of the method.(3)A biomedical named entity recognition system was designed and implemented.The biomedical named entity recognition system implemented in this thesis is designed and laid out using the Django framework,and uses a My SQL database to efficiently store and manage data for easy retrieval and access by users.The system facilitates the rapid identification of biomedical named entities contained in the input text and stores the results in the database,which is important for subsequent research by experts in the biomedical field. |