Font Size: a A A

Analysis Of Genetic Diversities Of Rapeseed And Rice With Third-generation Sequencing

Posted on:2021-04-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:J M SongFull Text:PDF
GTID:1363330611483350Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Brassica napus(B.napus)is the second most important oilseed crops and Oryza sativa(O.sativa)is one of the major food crops in the word.Although their reference genomes have been released,a single reference genome is not sufficient to fully capture intraspecific genetic diversity.In this study,eight high-quality reference genomes of B.napus and two rice were constructed using third-generation sequencing technologies.Based on these,the origin of subgenomes and whole genome duplications of B.napus were analyzed.At the same time,a wide range of genetic variations in B.napus were identified and gene index and pan-genome of B.napus were constructed.We have further analyzed the relationship between structural variations and several important agronomic traits.In addition,the intraspecific genetic variation and gene family expansion of O.sativa also had been analyzed,and a comprehensive bioinformatics platform of O.sativa subsp.indica had been constructed.The main results are as follows: 1.Construction of eight high-quality reference genomes of B.napusIn this study,Pac Bio,Hi-C,Bio Nano,and Illumina sequencing technologies were used to complete the assembly of eight B.napus genomes,including four semi-winter rapeseeds,two spring rapeseeds,and two winter rapeseeds,representing the major subspecies in the world.All the eight de novo genomes of B.napus reached chromosome level,and the contig N50 was in the range of 2.1-3.1 Mb.Independent verification results of the core gene set,BAC end sequencing,Bio Nano,Hi-C,and RNA-Seq data showed that the eight reference genomes had high accuracy and completeness.The results of genomic annotation showed that the genome of eight B.napus contained 94,586-100,919 coding genes,and 56.8-58.2% transposable elements(TEs)were identified in the eight assembled genomes.Meanwhile,it was found that the amplification of long terminal repeat-retrotranspsons(LTR-RTs)started early and lasted a long time in C subgenome,resulting in a larger genome size of C subgenome than the A subgenome.The Hi-C map of B.napus had obvious A/B compartment characteristics,in which the B compartment was concentrated in the centromere regions,and the A compartment was mainly distributed on chromosomal arms with higher gene density.2.Analysis of intraspecific genetic variation and construction of pan genome in B.napusIn this study,a phylogenetic tree of Brassicaceae was constructed based on the single copy orthologous gene.The results showed that accessions with same ecotype were clustered together,and the synthetic accession was closer to the diploid ancestors.We estimated the occurrence time of whole genomic triplication and differentiation events in B.napus by synonymous substitution rate analysis.The results showed that B.napus formed in the cross between B.rapa and B.oleracea about 10,000 years ago,and the differentiation between B.rapa and B.oleracea occurred ~3 million years ago(MYA),a whole genomic triplication of Brassica species occurred ~11 MYA,and Arabidopsis differentiated from Brassica about 14 MYA.Additionally,we analyzed the single nucleotide polymorphism(SNP)information of 210 B.napus accessions,199 B.rapa accessions,119 B.oleracea accessions,and 8 B.napus accessions assembled in this study.The A subgenome originated from turnip,but the origin of the C subgenome was still unclear.By comparison with the ZS11 genome,we identified 7.5-15.6 Mb inversions,39.7-49.1 Mb translocations,77.2-149.6 Mb presence/absence variations(PAVs),and a series of single nucleotide polymorphisms(SNPs)and small insertions/deletions(In Dels)in the other 7 B.napus genomes.And these variations had a large effect on more than 9.4% of the coding genes.By combining the resequencing data of 1,688 rapeseed varieties and eight reference genomes,we constructed the pan reference genome of B.napus,with a total length of about 1.8 Gb and containing 121,789 coding genes.At the gene family level,the pan-genome of B.napus contained 105,672 gene families.Among these gene families,about 56% were core gene families and about 42% were dispensable gene families.Specific gene families were enriched in functions such as ‘response to stimulation or stress’ and ‘protein phosphorylation’.In order to facilitate gene comparison and retrieval of interested gene among different rapeseed accessions,we also constructed the gene index of B.napus for the first time,which contained mapping information of 88,345 coding genes.These data were stored in the open-accessible B.napus pan-genome database,which provided great resources for genetic improvement of rapeseed.3.Analysis of the genetic basis of phenotypic differences based on PAV-GWASIn order to explore the contribution of structural variation to trait differences,this study conducted the genome-wide association analysis(GWAS)study on three important yield-related traits,including seed length,seed weight and flowering time.27,216 polymorphic PAVs were successfully obtained in a nested association mapping population using ZS11 as the donor.Based on that,we used the PAV-based genome-wide association analysis(PAV-GWAS)to directly determine the causal structural variation of seed length,seed weight,and flowering time,indicating that PAV-GWAS can be used as a complementary to SNP-GWAS in identifying associations to traits.In-depth analysis showed that PAVs on the three FLOWERING LOCUS C(FLC)genes were closely related to the flowering time and ecotype differentiation of B.napus.In particular,the structural variation of the Chr A10.FLC gene was highly correlated with ecotype division,which provided new insights into the genetic basis of ecotype differentiation in B.napus.4.Construction of indica reference genomeIn this study,Pac Bio,Bio Nano,and Illumina sequencing technologies were used to perform whole genome sequencing of O.sativa L.ssp indica ZS97 and MH63.Two highquality next-generation indica reference genomes were assembled,and 60,897 and 60,123 protein-coding genes were annotated,respectively.A more complete reference genome facilitated a more comprehensive analysis of repeat elements and LTR-RT insertion burst events in the genome.Approximately 45% repeat elements were identified in the indica genomes and the distribution characteristics were observed.We identified 1.28 million SNPs,0.32 million In Dels and 23.38-24.83 Mb PAVs between ZS97 and MH63 genomes.6,108 and 6,270 non-TE genes in the ZS97 and MH63 genomes were classified as divergent genes affected by these mutations,respectively.Hot spots of PAVs appeared at the end of chromosome Chr11,which may be related to the rich of R gene clusters and recent gene duplication in this region.In order to facilitate the use of the indica reference genomes,in this study we built an indica bioinformatics platform,and integrated multi-omics resources and computing tools in it.
Keywords/Search Tags:B.napus, O.sativa, genetic diversity, pan-genome, PAV-GWAS
PDF Full Text Request
Related items