| Coronaviruses are an important class of pathogens that include some viruses can infect humans such as SARS,MERS,and SARS-Co V-2.Since the outbreak of COVID-19,coronavirus research has gained great attention and received a large amount of research resources to greatly advance human understanding of coronaviruses,resulting in a large amount of scientific literature,which contains a large amount of tacit knowledge.However,the huge volume of scientific literature is beyond human analysis capabilities,and it is necessary to use literature analysis methods to process the information in the literature.The literature analysis method consists of two key techniques,one is bibliometric method,which quantitatively analyzes the characteristics of literature through mathematical and statistical methods to discover the hot spots and the research trends.Another is knowledge discovery to mine potential knowledge in literature by building knowledge discovery models.Based on literature analysis methods,this dissertation analyzes the current status and trends of coronavirus research using bibliometric methods and constructs a knowledge discovery model based on Me SH terms heterogeneous network to provide an effective method for discovering fine-grained knowledge associations.Then an empirical study of coronavirus is constructed for this model.In this dissertation,literature research,bibliometric analysis method,modeling and empirical method are used.First,we clarify relevant concepts,identify literature data sources,and construct a corresponding literature search formula based on coronavirus species and related research literature,and use the search formula to obtain coronavirus research literature from literature data sources.Second,based on the bibliometric analysis,the Me SH terms of the literature was extracted to build heterogeneous network with co-occurrence relationships in literature,and a knowledge discovery model is consisted of the network and random walk with restart algorithm.The accuracy of the knowledge discovery model is tested with literature data,and the model predictions are validated by literature data external to the model.The research in this dissertation is as follows:(1)In this dissertation,a bibliometric analysis study is conducted on coronavirus.A search formula is constructed by combining expert opinion and relevant literature studies,which can obtain coronavirus research literature from Pub Med database.The study is conducted by using Citespace and Vosviewer in five aspects: time,country,institution,authors,and topic.The study finds that the quantity of coronavirus-related research literature remains high but there is a downward trend,and the attention of researchers is beginning to shift.China and the United States are the countries with the most published literature;developed countries are still the mainstay of research,but emerging forces such as China and India have also emerged.Harvard University is the institution that publishes the most relevant research articles,and Chinese research institutions with the most published articles are behind in the ranking of publications,indicating that China’s investment in coronavirus research needs to be strengthened.Yuen,kwok-yung ’s team at the Hong Kong University is the main team in coronavirus research field,and highly productive researchers in the United States and Europe also build research teams centered on them,but there is less communication between teams.The treatment and prevention of coronavirus remains a key research focus for researchers,while some scholars have turned to the impact of viruses on society and the control of viral infections.(2)A knowledge discovery model based on Me SH terms heterogeneous network is constructed and tested.The knowledge discovery model consists of two parts: the Me SH terms heterogeneous network and the network mining algorithm.The heterogeneous network is constructed with the group matching words under the Me SH Terms field in the literature data,and random walk with restart(RWR)is chosen as the network mining algorithm.The potential associations in the network is mined by RWR and the prediction results are determined according to the high probability.After the model is constructed,the model prediction effect is tested by ROC,and the prediction accuracy of the model is tested by calculating the AUC.The average AUC of the model for multiple tests is 0.804,which means the model constructed in this dissertation has good predictive effect.(3)An empirical study is conducted on the knowledge discovery model.After analyzing the research characteristics of coronavirus prevention and treatment fields and selecting the corresponding Me SH terms into the model,the model predicted that "how to treat new coronavirus patients with acute kidney injury","how to treat new coronavirus patients with neurological disease"," How to develop antiviral drugs and vaccines based on coronavirus M protein " "How to prevent vaccination-induced thrombosis" might be potential research directions for the treatment and prevention of coronavirus infection.The above research directions have been confirmed by new publications in the Pub Med database beyond the literature in the model,indicating that the knowledge discovery model predicts well and is able to discover fine-grained knowledge associations in coronavirus literature. |