Font Size: a A A

QTL Mapping For Agrinomic Traits Based On Gossypium Mustelinum Genome Assembly And Introgression Lines In Cotton

Posted on:2022-03-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:1523306842462604Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
Cotton(Gossypium spp.)is one of the most important cash crops cultivated worldwide.However,modern cotton germplasm has very little genetic diversity.Deployment of unexploited gene pool from distant species has emerged as an essential option for enhancement of genetic diversity.The G.mustelinum(Gm)represents the earliest diverging evolutionary lineage of polyploid Gossypium,and it appears that populations of this species may have developed genotypes adapted to adverse conditions.Moreover,large-effect loci with Gm alleles that contributed to increase fiber quality have been detected.However,the knowledge of the recombination variation of interspecific hybridization between Gh and Gm,and the effect of fiber quality QTLs introgressed from Gm remains poorly understood.In order to use Gm to broaden the genetic diversity of Gh,create new breeding materials and explore novel alleles,an IL population between Gm and Gh was developed.In the present study,we constructed a reference grade genome assembly of Gm by integrating single-molecule real-time sequencing(SMRT)and high-throughput chromosome conformation capture techniques.Then,we resquenced the population of introgression lines(ILs)of G.mustelinum that was constructed by our group previously,to accurately identify the introgression fragments.QTL mapping was further carried out in combination with the field phenotype of important agronomic traits(6 for fiber quality,4 for yield,4 for plant architecture)at 4 independent enviroments.The main results were as follows:1.Genome assembly,validation and annotation of GmTo assemble the Gm genome,232 Gb PacBio long reads were error-corrected by 87.67Gb Illumina pair-end short reads and de novo assembled into 1,509 polished contigs with the contig N50 of 9.51 Mb,and with the aid of 215 Gb Hi-C data,867 contigs(2,278.34Mb)covering 96.64%of the estimated genome were categorized and ordered into 26chromosome-scale scaffolds.The 99.02%Illumina PE reads could be properly mapped to our Gm genome assembly;98.2%complete genes were covered by BUSCO identified;centromeric regions for each chromosome were identified by analysis of centromere-related long terminal repeat(LTR)retrotransposons,and the LTR assembly index(LAI)of our Gm assembly was 20.12.The above results showed the high integrity and accuracy of the genome assembly of Gm.Based on the highly contiguous genome assembly of Gm,70,776protein-coding genes were predicted.Meanwhile,the predicted repeated elements accounted for 71.2%of the Gm genome,the most abundant type of them was the long terminal repeat(LTR)retrotransposon element class,comprising 63.22%of the genome.2.Genome divergence between Gm and GhThe whole genome comparison between Gm,Gh,G.arboretum(Ga)and G.raimondii(Gr)showed that A genome and At subgenome had more large-scale rearrangements than D genome and Dt subgenome.LTR retrotransposons have experienced continuing and more recent amplification bursts from 0-2 MYA in Gh,Gm and Ga genomes,except Gr.We further detected PAVs that occurred between Gh,Gm,Ga and Gr.Consistent with the TE amplification,the number and length of PAV were approximately equal among Ga and two tetraploid At subgenomes,but among the Gr and two tetraploid Dt subgenomes,the number and length of PAVs between Gm and Gh were much higher than those between Gr and Gm Dt subgenome and between Gr and Gh Dt subgenome.Almost all the PAVs were covered by transposable elements,the insert time of LTR retrotransposons involved in PAVs were later than those in other region.These results suggested that transposable elements were one of major source of genetic variation during the evalution of polyploid cotton species.The species-specific genes in Gm showed more enrichment of PAVs overlapping the coding sequence and had a significantly higher Ka/Ks ratio.The SV-overlapped Gm-specific genes were enriched in categories related to stress response.3.Indentification of introgression segments and crossover regions in Gm IL populationWe constructed 285 Gm introgression lines(BC4F5)using the Gh cv.‘Emian22’as receptor.Based on 5×coverage Illumina pair-end sequencing,1,662 Gm introgression segments were identified from the 264 l Ls,and 83%of CO regions of the introgression segments were less than 5 Kb.The position of the COs of 264 ILs could converted the genome into 2,341 recombination bins which covered 94.92%of whole Gm genome,and75.94%of the bins were less than 0.5 Mb.The mapping results of two fuzz quality traits(green fuzz and fuzzless)showed that the high-quality genotype of the population could map the target traits with high resolution.4.Identification of stable QTLs in Gm ILsA total of 88,54 and 35 QTLs for fiber quality,yield-related and plant architechture were detected in 4 environments,respectively.Among the UHML,FEL,MIC and FS,more than 50%of QTLs of them had favorite alleles from the Gm,indicating the potential of Gm for fiber quality improvement.There were 13 stable QTLs could be detected across different environmnets.The size of associated recombination bins ranged from 26 Kb to 1.2 Mb.q UHML-D05,an additive QTL detected in all environments,increased the UHML with PVE values from 4.08%to12.03%.q FEL-A11 was another stable QTL detected in all environments with the additive effects of increasing 10.36%to 21.71%of the FEL.q UHML-D10 had decreasing additive effects,that could be detected in SHZ17,WH17 and WH18,and decreased the UHML with PVE values from 5.02%to 6.78%.Meanwhile,q SFC-D10 could be detected in WH17 and WH18,and could explain 6.41%and 7.7%of the observed phenotypic variations with additive effects.Three annotated genes were in the q UHML/SFC-D10 associated bin(26Kb).A gene(Gmus_D10G10333)that coding the heat stress transcription factor(HSFA4A)had six In Dels on the-1275 to-1164 bp of gene upstream was considered as the best candidate gene.RNA-seq data showed that,in developing fiber at 15 DPA,the expression of Gmus_D10G10333 in the corresponding IL was increased 9 folds than that in the control groups,which was consistent with the expected results.
Keywords/Search Tags:Gossypium mustelinum, PacBio, Hi-C, genome assembly, introgression lines, resequencing, quantitative trait loci
PDF Full Text Request
Related items