Font Size: a A A

Genome-wide Population Genetics Analysis Of Plasmodium Falciparum Isolates From China-myanmar Border

Posted on:2019-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y N TianFull Text:PDF
GTID:2404330551955965Subject:Pathogen Biology
Abstract/Summary:PDF Full Text Request
Malaria is a disease caused by the infection of malaria parasites and is one of the most devastating parasitic human diseases threatening the global public health.In recent years,great progressive have been achieved in the international control of malaria.However,there still remains much problems to be solved.Arteminsinin-based combination therapies are now recommended by the WHO as first-line treatment of uncomplicated falciparum malaria in all areas in which malaria is endemic.Therefore,it was reported that P.falciparum isolates became less sensitive to arteminsinin and its derivatives in southeast Asian and border of Yunnan in China,which raised alarm bell for malaria control.Though there were whole-genome sequencing data of thousands of P.falciparum strains from epidemic areas in Southeast Asia and Africa.Few gene information on China-Myanmar P.falciparum strains was reported.In the study,we made deep genetic analysis on P.falciparum strains from China-Myanmar and detected genetic difference compared with the genome of P.falciparum strains from other areas.Then established discriminant model to distinguish different geographic populations.In this study,44 blood samples of P.falciparum isolates were collected from the Lazan area of China-Myanmar(CM)border.After culture in vitro and extraction of Genomic DNA,all samples(n=44)underwent whole genome sequencing with Illumina technology.Meanwhile,fastq files of other isolates were download from the Sequence Read Archive(SRA)of the European Nucleotide Archive--40 from Thailand-Myanmar border(TM),60 from Thailand-Cambodia border(TC),40 from westAfrica(WAF).We removed the isolates that failed to meet following criteria: <70% mapping over the reference PF3D7;Depth<30-fold;multiplicity of infection.163 isolates were remained for next analysis--34 from CM,40 from TM,56 from TC,33 from WAF.We used GATK version 3.0 for SNP calling following the GATK Best Practices.For the remaining 163 samples,we retained 150761 high-quality SNPs--59720 from CM,42264 from TM,39196 from TC,90669 from WAF--in the chromosome genomes that met certain criteria.11849 SNPs were shared with all four populations and 21662 SNPs were detected in CM specially.The average pairwise nucleotide diversity(?)was calculated within each geographic population using PopGenome package of R.Nucleotide diversity in CM()was similar to that in WAF(),)and TM(),but significantly higher than TC().Linkage disequilibrium was calculated using PopLDdecay.LD map was constructed for four populations based on values.We found that decreased with the increase of distance.Our results suggested that decayed fast in WAF,followed by CM,TM and TC.The distance that decayed to 0.2 for CM,TM,TC and WAF was 120 bp,250bp,350 bp and 70 bp,respectively.The results indicated that high levels of haplotypic diversity existed in all the four populations and the level of haplotypic diversity in WAF is the highest.Similarly,Tajima's D and Fu's D of CM were calculated using PopGenome package of R.Tajima's D and Fu's D of CM were negative significantly,which indicated a historical and recent population expansion in CM.Evidence for selective pressure due to drug pressure or other mechanisms was investigated using the intra-population iHS metric and inter-population XPEHH metric.Both iHS and XPEHH were calculated using the R package rehh.Strong ihs hits revealed by Intra-population analysis included 78 genes in the CM population,of which 32 genes had 2 or more SNPs.The genes under positive selection were comprised of those encoding vaccine candidate(e.g.,ama1 and trap),drug-resistance genes(e.g.,ubp1),SURFIN family genes,and other membrane and surface proteins(e.g.,clag3.2,celtos).Inter-population analysis using the XPEHH has the potential to detect positive-selected alleles that have already achieved fixation.With CM as the reference population,trap,trep,ark3 were strongly identified across the other 3 populations,there were strong signal in 9 genes(e.g.ama1,ruvb1,degp,jmjc1)across TM and TC.Meanwhile,this analysis also identified positive selection in 41 genes(e.g.alg7,apiap2,cg1,glp3,acs,clag9,surfin8)in WAF compared to CM population.The Wright's fixation index Fst,a measure of population differentiation due to genetic structure,was calculated using PopGenome package of R.The overall pairwise differentiation between populations was directly proportational to the geographic distance between sites,being lowest between CM and TM and the highest between WAF and TC.Then we identified major geographical divisions of parasite population structure by Principal component analysis(PCA)and neighbor-joining analysis.PCA was undertook via Plink and gcta and the neighbor-joining tree was constructed using MEGA6.0.In PCA plots,P.falciparum from four populations clustered mostly according to their geographic origins by C1 and C2,except only 1 sample from TM clustering together with CM.Moreover,a rather similar sample clustering pattern was revealed by a standard neighbor-joining phylogenetic analysis,only 2 samples from TM clustered together with CM samples.We used the ADMIXTURE software package to detect the ancestry shared between isolates,the optimal number of cluster(K=6)was determined by performing multiple runs of the software under different K values(2-10).Similar to PCA and phylogenetic analysis,the model-based clustering approach implemented in ADMIXTURE software also defined WAF,CM,TM as distinct populations based on the whole genomic data set,and it reveals that 3 sub-populations existed in TC.We detected the genetic difference through pairwise comparison on the SNP hits of 4 populations using Plink.Top 100 loci per pairwise were annotated.There were 132 conserved region among 600 significant different SNP.We built a Discriminant model containing 33 loci based on the 132 coding loci using SPSS through Stepwise Dsicriminant Analysis.Then we test model fit through back substitution and new samples.The accuracy showed by retrospectivere test,cross validation test and new samples were 100%,96.69% and 88%,respectively.
Keywords/Search Tags:P.falciparum, Genome, SNP, population structure analysis
PDF Full Text Request
Related items