| Genetic diversity is the core foundation for the evaluation of germplasm resources and heterosis application.Maize exhibits extremely higher heterosis comparing with some major crops like rice and wheat.It is very essential for the analysis of genetic diversity and population structure for inbred lines used in maize breeding,for it can guide breeders in hybrid crossing and improvement of inbred lines.There are abundant genetic variations in maize genome,single nucleotide polymorphism(SNP)as a major type of variations,have been widely used in germplasm identification,association analysis,and gene mining.Because there is lack of deep and systemic research for the major foundation of germplasm in maize breeding program of Southwest China for a long period,especially for the genetic background of new selected inbred lines,so it is essential to understand the germplasm foundation,clarify genetic background,and simplify the heterosis utilized model.In addition,with the development of genome research,copy number variation(CNV)and its function become an important object in genetic diversity recently.Study of the impact of CNV on expression is valuable to explain the missing heritability beyond GWAS and can also provide reference to reveal the CNV distribution characteristics in maize genome and the regulatory mechanism of gene expression.In this study,362 inbred lines mainly widely-used in Southwest breeding program are collected to construct a Chinese association population.By use of 56,110 SNPs from Maize SNP50 Bead Chip,population structure,genetic diversity,and linkage disequilibrium(LD)are characterized.Besides,by the use of next-generation sequencing(NGS)data from an American maize association population which contain 271 inbred lines,a systemic anaylsis of genome-wide CNV distribution is performed.Moreover,combining to the gene expression level derived from transcriptome sequencing,CNV proportion for genes and e QTLs of gene expression level are identified.The main conclusions are as follows:1.362 inbred lines mainly widely-used in Southwest breeding are collected to be genotyped by 56,110 SNPs from Maize SNP50 Bead Chip.This helps us to understand the genetic background and relationship of part of inbred lines which are recently developed or with unclear source.In this population,two(Tropical and Temperate),three [Tropical,Stiff Stalk(SS),and non-Stiff Stalk(NSS)],four [Tropical,group A germplasm derived from modern U.S.hybrds(PA),group B germplasm derived from modern U.S.hybrds(PB)and Reid] and six [Tropical,PB,Reid,Iowa Stiff Stalk Synthetic(BSSS),PA and North]subgroups and characteristic of genetic diversity in each groups were identified.These results pave the way for genome-wide association study using the same population.Moreover,according to the cluster rules and crossing pattern of some representative hybrids,with the decrease of K value,a pathway of population simplication of heterotic groups is proposed in this study.2.When dividing the population to tropical and temperate germplasm,Tropical group showed more diverse than Temperate group(GD is 0.348 vs 0.331).Seven low-geneticdiversity and one high-genetic-diversity regions were collectively identified in Temperate group,Tropical group and the entire panel.SNPs with significant variation in allele frequency between Tropical and Temperate groups were also evaluated.Among them,a region located at 130 Mb on Chomosome 2 showed the highest in both numbers of SNPs with significant variation and the ratio of significant SNPs to total SNPs.With the respect to LD decay distance,in this study,Temperate group showed greater(2.5-3 Mb)than that in the entire panel(0.5-0.75 Mb)and Tropical group(0.25-0.5 Mb).In addition,a large region at 30-120 Mb of Chromosome 7 was concluded to be a conserved region during the breeding process compared between S37,a representative tropical line in Southwest China,and its 30 most similar derived lines.3.The results of CNV identification in American association population shows that the number of deletion is much higher than that of duplication in our study.However,the length of them shows similar distribution.Among all inbred lines,the average total length of CNV event was about 291.6 Mb,accounting for 14.2% of genome size.Chromosome 1 shows the highest in total number(996.31)and overall length(47.00 Mb)of CNV,but Chromosome10 shows the lowest(422.97 and 20.60 Mb).Va W6 has the highest CNV number and total length among inbred lines.Comparing amongst Tropical and Subtropical(TS),NSS,and SS subpopualtions,SS group has the lowest number of CNV and the shortest total length;NSS and TS groups show similar CNV number and length,NSS group is slightly higher than TS group.4.In this study,11,060 CNV regions(CNVRs)are identified,which accounting for 73.1%of total genome size.Moreover,20,653 genes overlapped by CNVRs are extracted for Gene Ontology(GO)analysis.The results indicated that these genes are mostly enriched in the biological process of reponse to stress,metabolic process,response to stimulus,and regulation of gene expression.In the comparison between CNV and gene density,repeat density,distance to centromere and recombination frequency,it is demonstrated that CNV distribution is significant positive correlation with other genomic features.5.With the respect to dissect whether CNV proportion in gene can affect gene expression level,genes with CNV overlapped show significant lower in Z-score of gene expression than genes without CNV overlapped.Moreover,results of the linear regression analysis between the overlapped proportion of different types of CNVs in gene and average gene expression rank in population illustrate that proportion of the entire CNVs and deletions are both positive correlation to the rank of gene expression,but the duplications show opposite but weaker trend comparing with the former.In addition,great variation is detected in number of genes significantly affected by CNV.6.With the respect to the analysis of gene expression level e QTL by CNV in different tissues and stages,totally 5675 genes are detected e QTL in all tissues and stages.Study of relative position between CNV and associated gene find that distribution of relative position is pronounced diverse.Most of e QTLs are located on the same chromosome(60.55%),among them,a majority of e QTLs are located around the gene(>70%),and less e QTLs locate on other chromosomes.In addition,associated genes in different tissues and stages show significant difference.Number of associated genes commonly are detected in more than a half(>3)tissues and stages only account for 10.31% of overall associated genes.The analysis of genome-wide enrichment of e QTL-CNVs indicates that number of CNV alleles is variable in different tissues and stages.A CNV allele window located in 61752001-61755000 of chromosome 10 enriches the most e QTL-CNVs in five tissues and stages,and sum of all stages.The enrichment of e QTLs located at gene upstream and downstream is also estimated,the results indicate that most e QTLs are located within 250 kb of gene upstream and downstream region,and reach the maximum at the position of 125 kb.7.A comparative analysis between identified e QTL by CNVs and adjacent SNPs are performed.The detection ratio of genes with e QTLs surrounding the associated gene shows the highest by these SNPs(average 99.76%),and that of genes with e QTLs located on the same chromosome relative to the associated gene was lower(average 97.48%),detection ratio of genes with e QTLs located on different chromosomes shows the lowest by these SNPs(average 72.19%).Moreover,same trend is found in seven different tissues and stages. |