| Spreading of multi-drug resistant(MDR)pathogens and emerging of new infective pathogens are causing an increase in infectious disease deaths globally while the increasing new cases of tumor patients and multi-drug resistance of tumors in recent years have also made tumor be a global world public health issue.Therefore,the search for novel compounds with antiinfection and antitumor activity has been necessitated in recent decades.Natural products(NPs)and their derivatives from bacteria are the important sources for drug discovery because of their structural diversity and a wide range of pharmacological activity.However,the low dereplication,low production,prolong time and more cost in finding,isolating and characterizing compounds make traditional approaches be no longer suitable and impede the discovery of novel compounds.More potential techniques are required to find novel natural chemicals structures.Bacterial genome sequencing revealed that they may synthesize 10 times more NPs than previously thought.Based on massive volumes of genomic sequence data found in public database,genome mining approaches for biosynthetic gene clusters(BGCs)may be able to uncover previously unknown cryptic metabolic and biosynthetic potential therapeutic candidates or chemical mediators which can synthesize the novel molecules.Mining and prioritizing NP BGCs would be the most important stage in the identification of novel compounds.Here,in order to investigate better the diversity and evolution of metabolic and biosynthetic potential in bacteria(Burkholderia,Pseudomonas and Streptomyces)for novel drug discovery,a systematic framework was built by combining different genome mining tools and was applied for identification and prioritization of the potential BGCs and structure prediction of novel compounds via comparative genomic analysis.The results here showed that,with the existence of a substantial number of unspecified clusters in their genomes and the ease of access of genetic manipulation methods for these genera,these bacteria constitute promising and desired genera and possess great potential for bioactive compound discovery.1.Phylogenetically species-level and subspecies-level abundance and diversity of BGCs in three genera of bacteria linking to rpoB-based evolutionary treeFirstly,genomic data available in public database for gram positive Streptomyces(62 reference genome sequences)and gram-negative Burkholderia(248 genome sequences)and Pseudomonas(37 reference genome sequences)bacteria were selected for investigation and exploration of the NPs.Initially,a thorough rpoB evolutionary tree was built utilizing three distinct species to examine the BGC prevalence of the species in respect to their rpoB phylogeny and it was found that the three genera in NCBI could be divided into 9(Streptomyces),14(Pseudomonas)and 7(Burkholderia)subgroups respectively.Using antiSMASH approaches,totally 1868 BGCs in Streptomyces,363 in Pseudomonas and 4729 in Burkholderia respectively were predicted for biosynthesis of secondary metabolites in these bacteria.By analyzing the BGC linking to the rpoB-based evolutionary trees,it was discovered that BGC diversity in each genus varies greatly across different species and even among sub-species of the same species,implying that subspecies-level genome sequencing may identify larger levels of BGC diversity and potentially valuable derivatives of any chemical.According to these results,searching secondary metabolites at subspecies-level may be an alternate or supplementary way to identifying new medicinal molecules from microorganisms.Many common types of BGCs were uncovered to be present in all species,implying the wide gene horizontal transfer of these BGCs in different genera of bacteria and their important role for the survival of these bacteria in different environments.2.Genome-level correlation between bacteria for BGCs and prioritizing of NP producersThe comparative genomics was used to find the whole genome relatedness among the species and checked the diversity,abundance and evolution of BGCs of three genera.Streptomyces,Burkholderia and Pseudomonas have 45,26 and 19 major classes of BGCs.A broad range of BGC abundance was detected in majority of the groups ranging from 19 to 48(average=30),7 to 26(average=16)and 6 to 16(average=10)BGCs per genome in Streptomyces,Burkholderia and Pseudomonas respectively.Most of the Burkholderia bacteria contain 22 BGCs.The average genome size calculated as 8.5 Mbp,7.2 and 6.0 Mbp in Streptomyces,Burkholderia and Pseudomonas respectively.The most prevalent classes of BGCs were observed to be different in these three genera and these types of BGCs accounted for roughly half of all BGCs detected in a single genome.the most common BGCs in Streptomyces are predicted for biosynthesis of terpenes,siderophores,NRPSs and hybrid BGCs,in Burkholderia for terpenes,non-ribosomal peptides(NRPs),RiPPslike,hserolactone and NRPs-like and in Pseudomonas for NAGGN,NRPs,redox-cofactor,RiPPslike,aryl polyene,beta-lactone and NRPs-like,and ranthipeptide.We also found that a strain may have multiple copies of a BGC class.Even among BGCs for one specific major class of compounds,they show obvious difference in three genera.Taking BGCs for bacteriocins as example,the most prevalent BGCs are predicted in Burkholderia for compounds capistruin and linocin_M18 bacteriocin and in Streptomyces for Zoocin_A,LAPs and Lanthipeptide_class_Ⅱ.Meanwhile,Pseudomonas species are predicted to produce class Ⅲ bacteriocins(>10 kD).Evolutionary analysis of species of Burkholderia using three methods revealed three obvious more ancestral clades but with some difference:(i)Both rpoB-and whole genome-based phylogeny showed more evolutionary variation,compared to that using core genes of BGCs for RiPP-like;(ii)The evolution of core genes of BGC for RiPP-like showed a high variation probably related to different habitats;(iii)By genome scanning using NaPDoS,we revealed a high abundance of core domains of polyketide-and NRP-type BGCs.Also,GCF(gene cluster families)networks were built using BGCs in a specific genus,which will be used to evaluate the relevance and evolutionary distance between these predicated BGCs and known BGCs.3.Isolation of NP-producing Streptomyces and Burkholderia and structural elucidation of bioactive NPsAmong the collection of bacteria isolated from soil samples from the unexplored mountain habitats(Tiesi Gang in Zoushi Town,Changde City,Huan,China),two strains were obtained:Streptomyces sp.CS-7 with strong antibacterial activity and Burkholderia sp.S-53 with comparably rapid growth rate.Strain identification reveal their closest species by two methods at molecular levels.By genome mining,it was found that Streptomyces sp.CS-7 has many cryptic BGCs with unknown structures while several BGCs have very less similarity with the known BGCs.On the other hand,complete sequence scanning of Burkholderia sp.S-53 uncovered many unknown BGCs which were assumed for novel compounds.Two important compounds mayamycin B and mayamycin having antimicrobial and cytotoxic effect were extracted and identified from Streptomyces sp.CS-7.4.ConclusionMining and prioritizing BGCs for NP production would be the most important stage in the identification of novel compounds in bacteria.In order to demonstrate the diversity,abundance and evolution of BGCs among bacteria,using multi-bioinformatic-tools,genome sequencing data of three genera of bacteria available in NCBI was systemically analyzed the species and subspecies levels.Firstly,it was uncovered in silico metabolic and biosynthetic potentials from Gram positive Streptomyces species and Gram-negative Burkholderia and Pseudomonas bacteria using largerscaled samples:(i)a huge number of important cryptic BGCs and prioritized some BGCs were found for further application;(ii)Streptomyces were found have a higher number of BGCs for NP production,compared to Burkholderia and Pseudomonas bacteria;(iii)the difference in the most prevalent BGCs among the three genera of bacteria and the high diversity of BGCs were observed at sub-special level;(iv)the difference in the diversity of BGCs for a specific major class of compounds exampled by RiPPs and bacteriocins was observed among the three genus of bacteria;(v)the whole genome relatedness among the species of a specific genera and BGC family networking were built among the predicated BGCs to show the distance from the known BGCs;(vi)evolutionary diversity of RiPP-like BGCs in the Burkholderia and high abundance of the core domains for the NRPS and PKs spanning the whole genomes were observed.Secondly,(i)two strains(CS-7 and S-53)belonging to Burkholderia and Streptomyces were isolated respectively from underexplored habitats and identified two important bioactive compounds from CS-7;(ii)bacterial identification at molecular levels of CS-7 and S-53 was conducted using sequences of single gene and whole genome using different methods;(iii)the high diversity,distribution and evolution of BGCs in both two strains were found.In conclusion,genome mining in silico approach has rebooted natural product research,making it more specific and precise and started second golden age of drug discovery.Knowledge of comparative genomics,evolutionary linkages,genome-wide diversity,and distribution patterns of BGCs are essential for prioritizing particular BGCs for drug development and identifying the most prolific producer strains.It was assumed that substantial heterogeneity across the varieties of bacterial species determines outstanding biosynthetic and metabolic potential,making them plausible candidates for the identification of novel molecules.Despite previous extensive investigations by functional identification of NPs,our genomic comparison revealed that bacteria possess wide and unique NP biosynthetic potentials,suggesting that they are still a viable source of new metabolites. |