| Data mining incorporates various techniques such as visualization,statistics,and artificial intelligence,and plays an important role in the extraction of potentially valuable information hidden in huge data.With the development of second-generation sequencing technology,there has been an exponential growth of genomic data.China,as a large population country,has rapidly accumulated public data in public health and disease prevention.Brucellosis,a category B infectious disease in China,has been on a strong rebound from the 21 st century.To gain an in-depth understanding of Brucella in order to prevent brucellosis caused by it,this thesis employs data mining and bioinformatics technology to perform genome-wide analysis using a high-resolution approach to determine its population structure,spatial and temporal distribution,transmission pathways,and functional differences;it analyses epidemiological survey data of brucellosis nationwide to obtain its spatial and temporal distribution characteristics,influencing factors,and time series models;it builds a visual analytics platform to provide visual analysis for genomic and epidemiological survey data.The main research of this thesis contains the following three parts.(1)The genomic study of Brucella canis based on publicly available genomic data worldwide.A total of 91 strains from around the world were used to discover four phylogenetic groups(PGs),construct evolutionary trees based on core-SNPs and molecular clock models for population structure and spatio-temporal distribution analysis,and use COG functional clustering to investigate functional differences.The four PGs found were significantly correlated with their geographic origins.Among PGs1 to 3,there was transmission from Asia to Africa and from Europe to America,and PG4 colonized North America with progressive deletion of drug resistance genes;the presence/absence spectrum of functional genes differed among the four PGs,with PG3 being the most functionally complete and the other PGs each having unique functional gene deletions,especially PG2 mostly missing ABC-type transporter system components.(2)The study of brucellosis based on epidemiological survey data in China’s mainland.The analysis of spatial distribution characteristics revealed that bovine and sheep brucellosis had a tendency to spread across the country,and the incidence of human brucellosis was relatively stable and dominated by the north,showing obvious aggregation characteristics across the country,and Inner Mongolia persisted as a local aggregation high incidence area for a long time;the analysis of factors influencing the incidence of human brucellosis revealed that temperature had a significant negative influence on it and sheep stocking had a significant positive influence on it;time series analysis revealed that the incidence of human brucellosis is characterized by distinct seasonal cycles in the present and foreseeable future,and that temperature is the Granger cause of human brucellosis.(3)A visual analysis platform was developed for Brucella.This platform provided software services for genome assembling,genome annotation,single nucleotide polymorphism identification,drug resistance gene finding and evolutionary tree construction for the analysis of genome.The combined use of these software packages allows for a process-oriented analysis of the genome.The platform also provided the visualization of spatial distribution and aggregation of incidence,the visualization of sub-temporal changes,temporal grouping,regional grouping and spatial distribution of prevalence for the analysis of incidence and prevalence data.Thus,it provided an intuitive and convenient scheme and reference for the visual analysis of epidemiological survey data of brucellosis. |