Anthrax is a zoonotic infectious disease caused by Bacillus anthracis infection that affects not only domestic and wild animals worldwide but also seriously affects human health.As research has intensified,the number of relevant scientific publications has proliferated and a huge amount of biomedical knowledge is scattered among them,and domain experts still rely on tedious literature search and reading to obtain relevant knowledge.Rapid and accurate access to biomedical knowledge of anthrax is essential for understanding its etiology,diagnostic methods,treatments,and in-depth research.Based on this,this thesis first designs a text mining method that builds a knowledge extraction pipeline system to extract anthrax-related biomedical knowledge from the literature then forms a knowledge graph,studies intelligent question-and-answer algorithms based on the knowledge graph,and finally develops a question and answer system to help researchers quickly access relevant data for more efficient scientific research.The main work and results of this thesis are as follows:(1)A biomedical knowledge extraction pipeline system is designed and implemented.To efficiently extract structured knowledge from scientific literature,this thesis designs a set of text mining methods,integrates relevant text mining tools and scripts,and combines Web technologies with the Python Django web framework to design and develop a visual knowledge extraction pipeline system,whose main functions include literature acquisition,named entity recognition,statement segmentation,relationship extraction,entity dictionary,and ontology generation,and data curation.After comparison,the system has the advantages of comprehensive functions,rich categories of identifiable entities,with checkpoints and support for expert curation compared with other mining tools in the field.(2)A biomedical knowledge graph of anthrax was constructed.In this thesis,relying on the above pipeline system,we extracted structured knowledge from 7764 anthrax biomedical literature abstracts,firstly combined with Beautiful Soup crawler technology to create an entity dictionary,then used the BERT-BiLSTM-CRF model to improve the dictionary,and obtained 14475 biomedical concept entities of 29 species,and the F1 value of the model was 84.9%,which is better than BiLSTM-CRF model compared to the BiLSTM-CRF model with an increase of 5.2%.Then a relationship extraction task was performed using OpenIE to construct the anthrax biomedical ontology,and finally,the ontology data were imported into the Neo4 j database to construct a knowledge graph containing 6636 nodes,6492 edges,32898 attributes,and 7755 triples.(3)An intelligent question-and-answer algorithm for anthrax biomedical knowledge is proposed.In this thesis,based on the knowledge graph,a question-and-answer algorithm combining template matching and deep learning is studied and implemented to extract entities from interrogative sentences using an anthrax-related entity dictionary and BioBERT-BiLSTM-CRF model,and the F1 value of the model is 85.2%,which is 0.7%higher than that of the BERT-BiLSTM-CRF model.Then the Cypher query statement combined with the knowledge graph is used for answer retrieval,and the answer is finally returned to the user,which provides technical support for the implementation of the Q&A system.(4)An intelligent question-and-answer system for anthrax biomedical knowledge was designed and implemented.In this thesis,a managed,multi-module,interactive,and visualized intelligent Q&A system is formed by using the Python Django web framework,combining the above Q&A algorithm and knowledge graph.The system includes a knowledge graph module,a data list module,a literature annotation module,an intelligent question and answer module,and a developer module. |