| Mycobacterium tuberculosis (MTB) is the pathogen of tuberculosis. Since the1950s, effective anti-TB drugs have been developed, so that the tuberculosis epidemic got control to some extent. But because of the neglect of the collaborative TB/HIV activities, misuse of antibiotics and so on, there is the resurgence of TB. China, India and other places have been an increasing number of drug-resistant MTB, and the lineage2of MTB is always considered as a successful population which seem to be more virulence and resistance, and which is popular in China. As a result more and more studies paid attention on the epidemiology of tuberculosis, drug resistance of Mycobacterium tuberculosis and the feature of lineage2. In this study we collected161isolates of clinical resistant MTB in China, and used the next generation sequencing to sequence them. We tried to do the genome-wide analysis of the evolution, the population structure and the drug resistance of M. tuberculosis. The main results of this study were as follows:1. According to the phylogenetic tree of183Mycobacterium tuberculosis complex (MTBC) isolates (161isolates from our data and22well-known isolates from NCBI database), we determined that161clinical isolates were belonged to lineage2(122isolates), lineage3(2isolates) and lineage4(37isolates) respectively. We found statistically significant difference of the number of mutations in the isolates between lineage2and lineage4, but the number of mutations had no significant difference among drug suspective isolates(DS), multidrug resistance isolates and extensively drug resistance isolates.2. Based on the inferred ancestral sequences, we identified295SNPs as lineage2-specific SNPs,167SNPs as lineage4-specific SNPs. The ratio of the tranversion of G:C> T:A to the transition of G:C> A:T shed light on that more mutations in lineage2possible due to oxidative damage. Because the oxidative damage is thought to be associated with the drug resistance in Mycobacterium tuberculosis, this phenomenon of mutation may illustrate the phenotype of lineage2which prefer to be mutation and drug resistance. In addition, according to the gene function enrichment analysis, we also found the transcription-related and the replication, recombination and repair-related COG classifications of lineage4-specific mutated genes were significant low, but these two COG categories of lineage2-specific mutated genes performed as function damage, such as recD, Rv0922, dinG, uvrC and nth. This can explain the reason why lineage2easily mutation being multi-drug resistance. Hence these lineage-specific SNPs not only can be used as molecular markers for future high-resolution genotyping technology, but also be the clue to interpret the different phenotype between lineage2and lineage4.3. Phylogeny and evolution analysis of our isolates help us trace the common ancestor of Mycobacterium tuberculosis complex (MTBC). We believed that the divergence of MTBC were associated with our human migration. As the early human migration, the oldest branch lineage5-6(about60,000years ago) was divided when modern human dispersed from Africa. The Indian ocean rim branch-lineage1were emerged at58,000years ago, then the Europe branch-lineage4turned up about37,000years ago), and finally about34,000years ago MTB divided into Central Asia branch (lineage3) and East Asia branch (lineage2).4. We paid close attention to the history of transmission of lineage2and lineage4in our date, we inferred that the sublineages of L2.1, L2.2, L2.3and L4.1, L4.2all originated and evoluted from one million years ago (around the Neolithic Age). L2.1as a relatively ancient sublineage in lineage2was mainly distributed in southeast of China, therefore L2.1was thought to transmission from south of China by early "southern route" of human migration in East Asia. In contrast, the relatively modern sublineages (L2.2, L2.3L4.1, L4.2) were more likely to associated with the "Northern route" which move from Central Asia or Siberia to China and then spread. Especially the modernest common ancestors of sublineage L2.3(5.2thousand million years ago), probably because they lived in the era of5,000years of Chinese civilization and endowed with evolutionary advantage (higher virulence, adaptability), so L2.3became the prevalent sublineage in China, and even began to spread around the whole world.5. When we processed the drug resistance-related analysis, we discarded phylogenetically related and synonymous SNPs firstly. We finally identified85genes and32intergenic regions (IGRs) were associated with drug resistance which contained a higher density of nonsynonymous SNPs or IGR SNPs (Poisson distribution P<0.05) and was mutated more frequently in drug-resistant isolates than in drug-sensitive ones (Normal distribution quantile P<0.01). These candidate drug resistance-associated regions included some well known drug-related regions (14genes and4IGRs). Because of the habit of combination therapy on the clinical, original statistical analysis showed that the well known regions were calculated associated with eight drugs simultaneously, such as rpoB, rpoC, katG, rpsL, rrs, embB and ethA,.6. After we did the further correlation analysis on well-known regions by using OR values and Fisher exact test, we discovered two main types of mutation. Essential genes of MTB present a bias on mutated sites, such as rpoB had two high rate mutation on codon531and526, codon315of katG showed high rate of genetic mutation. This kind of high mutation probably resulted from their low fitness cost, so that these mutations of essential genes would not harm to themselves too serious. In contrast, nonessential genes, such as ethA and pncA, showed mutations scattered variety, no specific high mutation sites. Reported genes fur A and inhA were weakly correlated with isoniazid; However the ethionamide which had the similar structure with isoniazid was stronger associated with inhA; Although streptomycin (first-line drugs) and capreomycin, kanamycin (second-line drugs) all attact on protein synthesis, streptomycin seem to depend on rpsL mutations and the capreomycin/kanamycin resistance relied on the mutations in1400nt of rrs; The relationship between embA and ethambutol, gyrB and ofloxacin were not as strong as embB and gyrA.7. The relationship between genome-wide mutations and mechanism of drug resistance were more complex than originally thought. According to STITCH database, except many well-known target genes, there were a set of compensatory mutations and cell wall-related pathways (fadD, pks and mmpL family, etc.); We found that candidate drug resistance-related genes were under strongly positive selection. And as a result of the long-term drug pressure, the drug resistance-associated genes identified here likely also contain24essential genes; The mutations of predicted sRNAs and the promoter regions in intergenic regions were possible the potential novel drug resistance mechanism which should deserve greater attention.In conclusion, based on the whole genome sequencing of a population of the clinical Mycobacterium tuberculosis from China, we identified a number of lineage2-specific SNPs, lineage4-specific SNPs and a group of candidate drug resistance-associated regions. The result illustrated the close relationship between transmission of TB and the history of human migration in China, indicated that the genetic basis of drug resistance is more complex than previously anticipated, and provided a strong foundation for elucidating unknown drug resistance mechanisms. Further more, this study may help us to predict future patterns of tuberculosis epidemics and to design rational strategies to diagnosis, treatment and prevention of tuberculosis, which could also promote the research of multi-omic analyses of Mycobacterium tuberculosis. |