Font Size: a A A

Population Evolution Of Vibrio Parahaemolyticus Based On Whole Genome Sequencing

Posted on:2016-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:X W YangFull Text:PDF
GTID:2134330461493453Subject:Military Preventive Medicine
Abstract/Summary:PDF Full Text Request
Vibrio parahaemolyticus(VP) is halophilic bacterium which widely presents in seawater and seafood. It is one of the main pathogens caused foodborne illness in the coastal areas of China. Antigen O and K are major antigens of VP and the diversity of the antigen components were traditionally used in strain typing. The population genetics research under the whole genome level would improve the understanding on the evolution and transmission of this species. Additionally, a distinctive feature of VP is its high frequency homologous recombination, the study of population structure and evolutionary dynamics based on whole genome sequences would further illuminate the role of recombination during the bacterial evolution.In this study, there are 157 strains of VP collected from different regions worldwide and different isolating time, which were used in whole genome sequencing. The comprehensive analyses of population genetics was performed to provide further knowledge for the following three questions.(i). The basic characteristics of the entire population in genetic diversity.(ii). The population structure of all strains and the evolution of epidemic clone.(iii). The estimation of effective population size and the evolutionary force. The Identification of Genome-wide VariationAccording to their status of presence/absence on the genome of each isolate, the genome sequences can be divided into core- and pan-genome. The size of core- and pan-genome are 4.07 Mbp and 17.33 Mbp, separately.Single nucleotide polymorphism(SNP) is the sequence polymorphism caused by a single nucleotide changed. The software MUMmer and SOAP were used in SNP detection on whole genome sequences. There are 327,904 high quality biallelic SNPs in the core-genome of 157 strains were identified, which supported the further analysis of phylogeny reconstruction and the estimate of population genetics parameters. The Identification of Epidemic Clone and Phylogenetic AnalysisBasing on the information of more than thirty million SNP loci, the phylogenetic tree of 157 strains was constructed using neighbor-joining method. It was found that, due to the high recombination frequency occurred in the population, the relationships is unclear and the internal nodes and deep branches in the tree revealed low bootstraps, short branch length, and the star-like topology. Although the phylogenetic tree can not give a clear evolutionary relationship, there are 21 distinct clonal groups are identified. The clonal group including 107 strains and the average genetic distance amongst them is 281 SNPs, far less than 35,500, the average genetic distance of whole population.For clonal groups, the evolutionary time is relatively short, leading to less number of observed recombination events. Therefore, the recombination occurred in clonal group can be detected by the density of SNP loci. This study used PAML and Rec HMM software estimated recombination events occurred in 21 clonal groups. The size and position of recombination events were identified and found a recombination hotspots, which surround the O- and K-antigen encoding gene cluster, in pandemic group. The hotspots is very long, which in a size of 158.5 kbp. Excluding the impact of these recombination events, the phylogeny of epidemic clonal group was reconstructed. Except one sub-lineage only found in Taiwan, other sublineages in this tree were present in multiple geographic locations. The geographic distribution implies a very rapid spread of the pandemic group. Population Structure and Oceanic Gene PoolsThe degree of recombination can be reflected by linkage disequilibrium(LD). To show the recombination degree in quantitative, the LD level of V. parahaemolyticus, E. coli, H. pylori, N. meningitis, S. aureus, S. pneumoniae and V. cholerae are calculated by using Haploview. By the LD comparison in all these species, VP decays to less than half of the initial value in 250 bp. Only H. pylori and E. coli has a faster initial decays than VP. Additionally, the LD value between distant SNPs is lower than any other species. In VP, the observation value of LD is similar to the expecting value of fully panmixed population.To investigate the gene flow and structure of the whole population, a nonredundant dataset of strains was selected, which composed by 71 strains, including 50 non-clonal strains and 21 representive from each clonal groups. By the application of Chromo Painter and fine STRUCTURE, 11 populations were detected in the nonredundant dataset. These 11 populations including 8 Asia-pops, which mainly composed by strains isolated from Asia, 1 US-pop, which composed by strains isolated from North American, and 2 Hyb-pops, which are two mixed populations. Eight Asiapops, one US-pop and two Hyb-pops contain 51, 3 and 5 strains separately. The value of Fst between Asia-pop and US-pop is 0.071 and the value is also low in other populations. This indicates that the nucleotide diversity should be similar between Asiapop and the whole population. The calculation results of migration rate implies the rarely occurred migration events. Therefore, the strains in Asia-pop and US-pop own the different gene pool, which may be due to the geographical separation.The number of strains isolated from North American is limited. To make the result more reliable, the MLST data of 648 VP strains, getting from pub MLST with 7 housekeeping genes, were added. Taking the results of structure and Chromo Painter into consideration, most of those STs under high ancestry with US-pop were isolated from Mexico Gulf. On the other hand, most STs isolated from Mexico Gulf are associated with US-pop. VP is a marine bacteria and according to the geographical distribution of the population, two oceanic gene pool were identified. It’s Asia-Pacific coast gene pool and Mexico Gulf gene pool. The Dynamics of VP EvolutionIn this study, 51 strains in Asia-pop is a single freely mixing population. The method of population genetics can be used to estimate Ner in Asia-pop, where Ne means the effective population size and r means the recombination rate per generation per site. This parameter has been approached in two different ways. The estimate value by individual sites is 9.8 and the value estimated by whole organism is 268. Under the neutral population genetic model, these two estimates should be similar, but they are different in 27 times. The selection at the organismal level may cause this result. The VP population diversified into ecologically distinct niches, which called ecotypes. Strains with different ecotypes have a low coalescent rate, but may not prevent variants outside of the ecotypically selected genome regions.In ecotype model, there should be epistatic loci in genome. So the epistatic loci can be the evidence of ecotype model. By 9.8 billion times of fisher exact test, all pairwised SNPs were detected and visualized the distribution of p-value in Q-Q plot. Eventually, a group of SNPs which apart 400 kbp in the genome shows significant relationship with p<10-9. These SNPs are epistatic loci. The analysis between SNP and genomic fragment also gives the evidence of epistatic. In this study, the main product of these epistatic loci involving type VI secretion system(T6SS) and c-di-GMP, which can regulate genes related to biofilm formation. In this dataset, only one of the two systems could be observed in a single strain. This implies that the epistatic loci are under strong selection pressure, which might closely relate to the fitness of strain in different niches.
Keywords/Search Tags:Vibrio parahaemolyticus, whole-genome sequencing, population structure, homologous recombination, population genetics
PDF Full Text Request
Related items