ObjectiveThe whole-genome sequencing technique was used to analyze the molecular typing of different types(colonization vs infection)of Staphylococcus aureus(S.aureus)at the genome level,and the genetic background differences between infected and colonized strains were revealed by phylogenetic analyses.The relationship between resistance gene mutation and resistance phenotypes of S.aureus was analyzed,so as to clarify the mechanisms of antimicrobial resistance.This study also aimed to systematically clarify the differences of resistance phenotypes,molecular characteristics,SNPs,and k-mers between infected and colonized S.aureus,and explore the disease-associated markers of S.aureus from multiple angles,so as to provide a theoretical basis for tracking and monitoring highly pathogenic strains of infection and also provide genetic evidence for the targeted intervention of S.aureus.MethodsThe cross-sectional study and molecular epidemiology study were used in this study.S.aureus was isolated and identified from nose swabs of healthy children in kindergartens and clinical specimens of hospitalized children with infection.Antimicrobial sensitivity tests and whole-genome sequencing analyses of all S.aureus isolates were carried out to obtain the MLST typing,spa typing,SCCmec typing,resistance phenotypes,resistance genes,gene mutation,toxin genes,SNPs,and k-mers.Core SNPs were used to construct the phylogenetic tree so as to reveal the differences of genetic background between different types of isolates.The combination of univariate analyses(Pearsonχ~2test,Fisher exact test,and generalized linear mixed model)and machine learning method(random forest)were used to explore the disease-associated markers of S.aureus,and the classification and regression tree and logistic regression were used to reveal the high-order interaction between these markers.Results1.The predominant clones of S.aureus:the predominant clones of colonized isolates were CC45(42 isolates),CC59(39 isolates),and CC5(22 isolates);the predominant clones of infected isolates were CC59(52isolates),CC188(40 isolates),CC5(27 isolates),CC6(27 isolates),and CC7(27 isolates).2.Comparison of phenotypic and molecular characteristics between colonized and infected S.aureus:the rates of resistance to cefoxitin,clindamycin,tetracycline,and ciprofloxacin were significantly higher in infected isolates than in colonized isolates(P<0.05).The carriage rates of lnu(A),lnu(G),aad D,tet(K),and dfr G resistance genes were significantly higher in infected isolates than in colonized isolates(P<0.05).The relationship between resistance phenotypes and mutations showed that double-site mutations in the pbp2 gene(mainly T439V and A35G)was positively correlated with cefoxitin resistance(OR=8.52);single point mutation in the pbp4 gene(mainly E398A)was negatively correlated with cefoxitin resistance(OR=0.31),and rpo B gene mutations(mainly H481N and D320N)were positively correlated with rifampicin resistance(OR=3.11).The carriage rates of enterotoxin genes(seb and sep),extracellular enzyme coding genes(spl A,spl B,spl E,and edin C),Panton-Valentine leukocidin gene(luk D,luk E,luk F-PV,and luk S-PV),and epidermal exfoliation genes(eta and etb)of infected isolates were higher than those of colonized isolates,but the carriage rates of enterotoxin genes(sec,sec3,seg,seh,sei,sel,sem,sen,seo,and seu)of infected isolates were lower than those of colonized isolates(P<0.05).3.Genetic background analysis of colonized and infected S.aureus:Phylogenetic analyses based on core SNPs showed that colonized and infected isolates had similar genetic backgrounds,suggesting that the infected isolates did not represent specific pathogenic clones.4.Screening S.aureus disease-associated markers:(1)In terms of resistance phenotypes,the random forest model finally selected 10disease-associated markers(trimethoprim-sulfamethoxazole,clindamycin,erythromycin,rifampicin,cefoxitin,tetracycline,chloramphenicol,penicillin,gentamicin and ciprofloxacin).The cross-verification accuracy rate and AUC for this model was 73.41%and 0.74,respectively.(2)In terms of molecular characteristics,the random forest model finally selected 16 disease-associated markers(sem,etb,spl E,sep,ser,mec A,lnu A,sea,bla Z,cat(p C233),bla TEM-1A,aph(3’)-III,erm B,erm A,ant(9)-Ia and ant(6)-Ia).The cross-verification accuracy rate and AUC for this model was 67.81%and 0.70,respectively.(3)In terms of SNPs,23 disease-associated SNPs were selected by the combination of GWAS and random forest model(including pyn 2292886 C→T,sas A 2839985C→T,sdr C 618337 C→A,spa 12433 T→C,etc.).The cross-verification accuracy rate and AUC for this model was 84.39%and 0.83,respectively.(4)In terms of k-mers,20 disease-related k-mers(included in exotoxin-related sep,luk S-PV and luk F-PV gene,adhesion-related fnb A,sdr C and bbp genes,extracellular enzyme coding gene coa,and immune regulation-related gene ebh)were selected by the combination of GWAS and random forest model.The cross-verification accuracy rate and AUC for this model was 73.8%and 0.80,respectively.5.Classification and regression tree combined with logistic regression revealed the interaction among disease-associated markers:the phenotypic combination of clindamycin resistance and trimethoprim-sulfamethoxazole sensitivity(OR=12.84),molecular characteristic combination of sem(+)-etb(+)(OR=2.55),SNPs combination of rs2292886(mutant)-rs2035687(mutant)-rs1231151(wild type)-rs815732(wild type)-rs751346(mutant)(OR=31.50),the k-mers-related gene combinations of fnb A(+)-luk F-PV(-)-luk S-PV(-)-sdr C(+)and fnb A(+)-luk F-PV(+)/luk S-PV(+)were positively correlated with the increased risk of disease(OR=15.83 and OR=14.05).Conclusions1.The predominant clones of colonized S.aureus were CC45,CC59 and CC5,and the predominant clones of infected S.aureus were CC59,CC188,CC5,CC6 and CC7.All the predominant clones contained in colonized and infected isolates,suggesting that the genetic background between colonized and infected isolates was similar.2.Based on the high-dimensional genetic characteristics of the whole genome,we used the combination of GWAS and random forest model analysis strategy to screen 23 disease-related SNPs and 20disease-related k-mers,with the model fitting being good.It provided genetic evidences for tracing the strain with high pathogenicity and carrying out accurate targeted intervention of S.aureus.3.The combination of CART analysis and logistic regression analysis strategy was used to explore the high-order interaction of disease-associated markers:S.aureus with the combination of clindamycin resistance and trimethoprim-sulfamethoxazole sensitivity in drug resistance phenotypes,the combination of sem(+)-etb(+)in molecular characteristics,the combination of rs2292886(mutant)-rs2035687(mutant)-rs1231151(wild type)-rs815732(wild type)-rs751346(mutant)in SNPs,the combination of fnb A(+)-luk F-PV(-)-luk S-PV(-)-sdr C(+)and fnb A(+)-luk F-PV(+)/luk S-PV(+) in corresponding genes of k-mers,had a higher pathogenic risk. |