Font Size: a A A

Genetic Differentiation And Domestication Of Allotetraploid Cotton

Posted on:2017-02-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:C X LiuFull Text:PDF
GTID:1363330575477128Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
Cotton has been domesticated from perennial trees into annual crops.Two allotetraploid species,Gossypiumhirsutum and G.barbadense,were domesticated after polyploidization and are cultivated worldwide.Studies on the domestication of Sea-island cotton and upland cotton have been constantly debated.In spite of the early studies of different molecular markers and isozymes,the results were limited because of limited data.Whole genome re-sequencing have been applied in many crops,but the overall genetic diversity between and within the cultivated species is poorly understood.Here we resequenced approximately five fold coverages of the genomes of 147 diverse wild relatives,landraces,and modern cultivars and found distinct divergent patterns between G.hirsutum and G.barbadense,generating a total of 1.8 terabases of raw sequence data,and aligned the reads to the reference genome sequence of TM-1.Two recently sequenced genome accession of G.barbadense cv.XH21 and G.hirsutum acc.TM-1 in our sequence panel were used as control,with a low missing data rate(6.8%).We further randomly selected 68 SNPs to carry out PCR-based sequencing in 11 accessions and found that the accuracy was 95.0%.Therefore,the quality should be reliable to conduct follow-up phylogenetic and population genetic analyses.Overall,we identified 16,377,749 non-unique single nucleotide polymorphisms(SNPs)which is defined as those with the variant occurring in two or more accessions and 144,662 non-unique InDels(1bp-8kb).These variants were distributed across all 26 chromosomes,with an average density of 8.5 SNPs per kb.The SNP density in the Asubgenome(9.2 SNPs per kb)was higher than that in the D subgenome(7.4 SNPs per kb).By analyzing the allele frequency of each SNP site in the 147 accessions,we identified 7,993,856 common SNPs,each with an allele frequency of>5%,including 3,203,112 intraspecific SNPs in G.hirsutum,3,770,221 in G.barbadense,and 2,752,128(-34.4%)nearly-fixed interspecific SNPs(the SNPs with an allele frequency of>95%in G.hirsutum or G.barbadense and<5%in the other species.These were thought as the specie unique SNPs.G.barbadense and G.hirsutum had high genetic diversity and obvious genetic differentiation between them.Much of the genetic diversity of cotton can be quantified by the frequency of SNPs.The allele frequency distributions of 44,250 nearly-fixed cSNPs were highly diverged between G.hirsutum and G.barbadense.We examined the genetic diversity across the 26 chromosomes,and a strong signal of differentiation was observed at the whole genome level between G.hirsutum and G.barbadense accessions.The fixation index values(FST)were 0.63 and 0.65 in the A and Dsubgenomes,respectively,which are slightly higher than that between indica and japonica rice subspecies(FST=0.55)(Huang et al.,2010).The two cultivated species underwent independent evolution during domestication and modern breeding.The whole-genome SNP data were used to investigate the phylogenetic relationships between all allotetraploid cotton collections.The subsequently produced neighbor-joining(NJ)tree resulted in two largely divergent clades:the G.hirsutum clade(n=85)and the G.barbadense clade(n=52).Model-based analyses of population structure were performed using STRUCTURE,when K(the number of population modeled)was set to 2,there were two evidently different components between G.barbadense and G.hirsutum.For K as 3,there were three evident different components,G.barbadense cultivars,G.hirsutum cultivars and G.hirsutum races.This model-based result,along with that from principal component analysis,agreed well with the pattern in the phylogenetic tree.From the three aspects,we proved enomic evidence for the genetic differentiation and domestication between Seaisland cotton and Upland cotton.In addition,the G.barbadense 109 regions homologous to G.hirsutum 109 selected sweeps were not suffered selection pressure.All these indicate independent domestication between G.hirsutum and G.barbadense.Asymmetric introgression between G.hirsutum and G.barbadense.To analyze introgression between tetraploid cottons,a recently developed "3-population test" method(Myles et al.,2010;Reich et al.,2009)was used for modeling.These introgression events were successfully traced using the population-scale genomic data generated in the present study.On average,0.2%genomic regions showed obvious introgression events.These regions have a certain overlap with the fiber quality related QTLs,suggesting that G.hirsutum may have more adaptation allelesfor genetic improvement.Genomic evidence from the present study revealed subsequent introgressions from the local G.hirsutum wild or races into G.barbadense during its movement northward America.A small number of interspecific reciprocal introgression events between the two cultivated species may have been the reason for their improved adaptation,fiber yield,and quality.According to the introgression events between G.hirsutum races and G.arbadense,we found the G.barbadense was indeed origin from Peru and Brazil of the North West of South America.And now there are three types of G.barbadense,Egyptian type,Pima type and Central Asia type.Signatures of selection and adaptive trait associations in G.hirsutum.The genetic diversity in modern cultivars was found to be low,only 34.2%(32.4%and 35.0%for the A and D subgenomes,respectively)of that in races,indicating a strong genetic bottleneck during Upland cotton domestication.Through population-scale comparisons of G.hirsutum races and cultivars,we identified 109 selective sweeps,occupying 3.4%of the cotton genome.Interestingly,12 homologous pairs of selective sweeps with high or modest selection signals(πrace/πcultivar ranging from 15.4 to 39.6)were detected between the A and D subgenomes.Using RNA-seq data from 35 multiple tissues,we found that the 76 fiber related and 115 seed genrmination related genes that were expressed during fiber developmentand seed germination were higher in the selective sweeps than in whole genome.Two locus associated with the strongest selection signal(πrace/πcutivar =100.0)were located on Chr.D11 and A06.The overlap between selective sweeps and QTLs of various agronomic traits was further examined and found the selected sweep located on Chr.D11 overlapped with several QTLs controlling fiber length.Another strong selective sweep was located on Chr.A06 overlapped the QTLs for fiber length and lint percentage.These results provide genomic bases for improving cotton production and for the further evolution and domestication of polyploid crops.Polyploidy or Whole Genome duplication is important driving force of plant evolution.The allotetraploid cotton is an important mode crop to research polyploidy and gene evolution in plant.Polyploidy is a prominent and significant force in plant evolution.The fate of duplicated(homoeologous)genes is poorly understood,both functionally and evolutionarily.Overall,the fate of duplicated genes were defunctionalization(pseudogenization),sub functionalization or neofunctionalization.Before this study,we anylized the yield related heterosis utilization and found and isolated a novel ethylene responsive element binding factor(designated GhERF1-7A)from cotton,which was expressed differentially between one hybrid(XZM2)and two parents(ZMS12 and 8891)in seedling leaves and highly similar to GhERF1(renamed GhERF1-7D)(Qiao et al.2008).They were localized on one pair homoeologous chromosomes,Chr.07(A07)(GhERF1-7A)and Chr.16(D07)(GhERF1-7D),respectively(Lu KY),therefore a pair of homoeologous genes with different function and expression profiles and were closely related to the B3 subgroup of the ERF subfamily.And we found the GhERF1-7A gene was localized in a QTL controlling cotton bolls number per single plant(Lu KY).In our sudy,we found the GhERF1-7A gene undergene all the three types of duplicated gene fates:lost function(defunctionalization)in the female parent CRI12,expression subfunctionalization in various tissues and organs and under different abiotic stresses and neo-functionalization on increasing siliques number of per single plant in transgenic Arabidopsis.These provide good molecular evidence for the functional differentiation of the duplicated genes of polyploid plants.The GhERF1-7A gene lost function in female parent CRI 12.In order to explore whether the divergence of the two genes happen after the formation of allopolyploid cotton or not,we isolated the two GhERF1 genes in diploid G.arboreum and G.raimondii and found the base’A’ insertion not existed in 13 D-genome diploids and 23 G.arboreum cotton species.So we confirmed the lost function of GhERF1-7A with a base insertion occurred after the formation of allotetraploid cotton,which might be involved in allotetraploid cotton evolution and domestication.The GhERF1 genes were differential expressed in various tissues and organs and different stresses.The GhERF1-7D gene may play a key role in abiotic stresses,however,the GhERF1-7A gene undergone,pseudogenization,subfunctionalization or neofunctionalization after the formation of allotetraploid cotton,which might play an important role in improving the cotton yield by increasing the bolls number per single plant.And GhERF1-7A overexpressed in Arabidopsiscan significantly increased the number of siliques and total seed yield.In addition,the GhERF1-7D gene may improved cotton abiotic stress resistance and the GhERF1-7A can be used as an excellent candidate for improving cropproduction and both GhERF1-7D and GhERF1-7A were examples for studying homoeologous genes evolution.o investigate the GhERF1-7A gene is a domesticated or improved gene in upland cottons,we isolated and sequenced GhERF1-7A/7D ORFs world widely from 524 accessions including 191 primitive races and 333 improved cultivars in Upland cotton.Sequence analysis revealed all the GhERF1-7Ds with full ORF and conserved in all accessions with no insertion or deletion mutation.In contrast,the insertion mutation of the GhERF1-7A gene is prevalent inUpland cotton accessions with 25.5%(85 out of 333)in cultivars and 52.9%(101 out of 191)in primitive races.These results suggested the GhERF1-7A undergone pseudogenization or lost the primary function in the early accessions with the frameshift mutation during cotton polyploidization.Domestication and modern genetic improvement to increase cotton yield lead to a significant decrease of the rates of individuals with an A base insertion mutation of the GhERF1-7A.Sequence and function evolution of GhERF1-7A may reveal the function diversity of duplicated genes and multiple domestication for allotetraploid cotton.The APETALA2/ethylene-responsive element binding protein(AP2/EREBP)transcription factors constitute one of the largest and most conserved gene family in plant,playing crucial roles in plant growth,development and biotic or abiotic stress response.A total of 269 AP2/EREBP genes in Gosssypium raimondii(D5)were identified in the physic cotton genome.They were classified into four subfamilies,including AP2,RAV,DREB and ERF subfamilies,while 4 genes were grouped as outsiders.The protein domain architecture and intron/exon structure are relatively conserved and simple within each subfamily.The AP2/EREBP genes in G.raimondii are distributed throughout all chromosomes but are uneven.Here,we identified 73 tandem duplicated genes and 221 segmental duplicated gene pairs which contributed for the expansion of AP2/EREBP superfamily.Transcriptome analysis showed 504 AP2/EREBP genes in G.hirsutum were expressed in at least one tested tissue.Further,expression profiles under abiotic stress exhibited that 68%of the DREB and ERF subfamily genes in G.hirsutum were responsive to different stresses,132 genes were induced by cold,63 genes in drought and 94 genes in heat.qRT-PCR confirmed 13 GhDREB and 15 GhERF genes were induced by cold and/or drought.Of the 111 tandem duplicated genes in TM-1,53 genes were no transcripts detected.In addition,some homoeologous genes showed bias expression between A-and D-subgenome,several of homoeologous pairs were taken as examples and showed similar,lost,divergent or specific expression patterns between homoeologous gene pairs reveling functional diversity of duplicated genes in G.hirsutum.Our genome-wide analysis of AP2/EREBP genes in cotton provides valuable information for characterizing the molecular functions and revealing AP2/EREBPs evolution in polyploid plants.
Keywords/Search Tags:Gossypium hirsutum, G.barbadense, Evolution, Domestication, Resequencing, Fiber, Seed development, ERF1, Polyploidy, Functional Differentiation, Homoeologous genes, Fruit Yield, AP2/EREBP family, stress tolerance, growth and development
PDF Full Text Request
Related items