Since 2020,COVID-19 broke out all over the world,causing irreversible and huge impact on the production and life of all mankind,and all countries in the world suffers heavy losses.Some countries,such as China,have taken timely and effective measures to curb the spread of the epidemic,the mutated strains of the new coronavirus have swept again and again.The mutated strains are more contagious,and under exposure conditions Survival time is longer.At the same time,the number of research literatures on COVID-19 has also grown rapidly.If the researchers only obtain relevant information by manual reading,the huge data will consume a lot of time and energy.The epidemic is imminent,so the knowledge obtained from the literature data using biomedical data mining technology will help researchers to promote drug development and vaccine research and development.For biomedical data mining,named entity recognition and relationship extraction are two key tasks.Biomedical named entity recognition is also considered as a sub-process of biomedical relationship extraction.The methods of biomedical relation extraction can be divided into rule-based methods,statistical-based methods and machine learning methods.Machine learning methods can be further divided into traditional methods based on feature engineering and methods based on deep learning.Annotated corpora in general fields need to consume a lot of cost.Due to the particularity of biomedical data,biomedical annotation data is scarce.Therefore,deep learning methods that do not rely on feature engineering have gradually become the mainstream.Among the deep learning methods for natural language processing,the "pre-trained language model + fine-tune" model stands out and shows excellent performance on various tasks.But there is still a lot of room for improvement in his work in the biomedical field.Therefore,this paper first proposes a pre-training language model for the new coronavirus pneumonia,using the latest unlabeled corpus of COVID-19,re-pre-training for the COVID-19 domain and professional,so as to obtain three Stage pre-training model P3 model to improve the performance of data mining downstream tasks.The biomedical texts not only contain simple binary relationships,but even complex overlapping relationships.Therefore,we propose a COVID-19 multi-relation extraction model.For the overlapping relationship in the text,this paper proposes the method of Entity-position encoding,which introduces the absolute position and relative position of the entity in the text,and adds the distance between the entity pairs as additional information to the model,making full use of the relationship between various biomedical entities.The existing semantic relationship improves the performance of data mining.Finally,this paper proposes a construction framework for the knowledge graph of COVID-19.The data of COVID-19 is stored and represented by the knowledge graph,and the knowledge inference and time slicing methods are used to discover potential biomedical relationships based on the knowledge graph,providing possible new possibilities for the development of drugs and vaccines for COVID-19. |