Font Size: a A A

Genetic Diversity Of Parent In Seed Orchard Of Pinus Massoniana Using SNP Calling From Genotyping By Sequencing

Posted on:2024-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:T Y LiuFull Text:PDF
GTID:2543306938987889Subject:Forest science
Abstract/Summary:PDF Full Text Request
Masson pine(Pinus massoniana Lamb.),belonging to the Pinus L.of Pinaceae,is a tall tree with excellent characteristics of fast growing,high yield,high quality,drought and barren resistance,strong adaptability,wide use and high degree of comprehensive utilization.It is the main industrial timber tree species with the widest distribution,the largest area and highest volume among the Pinus L.in China.Genetic improvement for improved varieties of Masson pine has received extended attention.The evaluation of population genetic diversity and the characterization of genetic distinctness among individuals are the main contents of germplasm genetic management in a seed orchard.The development of high-throughput sequencing technology has provided a convenient approach on germplasm characterization and evaluation for genetic management in a seed orchards.In this study,70 samples,selected from the clonal seed orchard of Masson pine at Baisha State-owned Forest Farm in Shanghang,Fujian Province,were sequenced by genotyping-by-sequencing(GBS)and the de novo contigs used as de novo reference for SNP calling were assembled using different pipelines with different assembly strategies.Based on the obtained high-throughput SNP,the genetic diversity and the genetic distinctness on the selected samples were performed and the relationship between the samples was discovered.These findings provides helpful information for the genetic management in the seed orchard of Masson pine.The main results are following:A total of 140 paired sequencing.fastq files were obtained from the 70 sample needles selected from the Masson pine clone seed orchard through DNA extraction,double enzyme digestion with EcoRV and ScaI,GBS sequencing library construction and sequencing using the Illumina NovaseqTM 6000 with 150 bp paired-end sequencing mode.After removing the adapters and low-quality sequences,a total of 603.71 GB of clean data were obtained.The maximum sequencing data volume of a file in clean data is 6.02 GB,the minimum is 2.82 GB,and the average is 4.31 GB;The length of reads in clean data ranged from 40 bp to 137 bp,with an average of 135.07 bp;The average GC content was 41.72%and the Q30 is 94.29%.Two assembly softwares,npGeno and MEGAHIT,were used to carry out the de novo assembly of contigs in Masson pine.The quality of the obtained contigs was evaluated using QUEST,and the whole genome sequencing sequence of loblolly pine(Pinus taeda L.)was used to be blasted with the obtained de novo assembly for coverage evaluation.After screening and quality control,four sets of de novo assemblies with the highest coverage blasted on the whole genome of loblolly pine were selected from npGeno and MEGAHIT as de novo reference for SNP calling.Among the assembled contings using npGeno pipeline,s9k70 group was the optional one with the largest number of 11 233 and a total length of 261 117 bp,with an average length of 201.30 bp.The evaluation on N50,GC content and genomic coverage blasted on the whole genome of loblolly pine was 709 bp,47.35%and 0.033%respectively;Among the assembled contings using MEGAHIT,s48k141 group was the optional one with the largest number of 1 561 250,a total length of 512 713 695 bp,and an average length of 328.40 bp.The quality evaluation on N50,GC content,and the genomic coverage blasted on the whole genome of loblolly pine was 705bp,40.12%and 6.723%respectively.Those results showed that the numbers of assembled assembly obtained using MEGAHIT was much higher than that of using npGeno pipeline.Taking each set of four optimal contings assembled using npGeno and MEGAHIT as de novo reference,a total of eight sets of SNPs were obtained.The numbers of SNPs developed from the de novo reference assembled using MEGAHIT were far more than that of developed using npGeno.The highest number of SNPs among the eight sets was 26 478 893(megahit-s48k141).Each set of SNPs was filtered with a missing ratio of 0 and different genotype frequencies greater than 0.05 among the 70 assessed samples.The highest number of SNPs remained in npGeno groups was 3101(s9k70m0f5),and the highest number of SNPs in MEGAHIT groups was 153 549(s48k141m0f5);Increasing the threshold of different genotypes frequency greater than 0.1 to filter out more SNP,the highest number of SNPs in npGeno groups was only 665(s9k70m0fl0),while the highest number of SNPs in MEGAHIT groups was 32 247(s48k141m0f10).Based on the obtained SNP for mutation calculation,the main mutation type of SNP base in Masson pine was A/G conversion mutation.The distribution of minor allele frequency(MAF)of SNP was L-shaped,and most of SNPs with the MAF less than 0.1.Compared those relationship patterns among samples obtained using Principal Coordinate Analysis(PCoA)based on different sets of SNP,the patterns resulted from MEGAHIT groups SNP under different filtering parameters were more stable than that of from npGeno groups SNP.In conclusion,the MEGAHIT pipeline was an optional software for de novo reference assembly on sequencing data from the huge and complex forest genome.Two sets of optimal SNPs in s9k70m0f5(3101)and s48k141m0f10(10530)from npGeno and MEGAHIT each group were selected to study the genetic diversity of the parental populations in the clonal seed orchard of Masson Pine.According to the three natural forest stand populations named Wuping,Yongding and Liancheng among the assessed 70 samples,the genetic diversity in the three populations was relative high,however,there was little genetic differentiation with the genetic variation among three populations less than 0.316%in AMOVA analysis.Based on the results of STRUCTURE and PCoA analysis,70 Masson pine samples were divided into two populations:pop Ⅰ(8 samples)and pop Ⅱ(62 samples);The genetic differentiation between pop Ⅰ and pop Ⅱ populations was 7.82%,which was higher than the natural stand population,indicating that the parents in the seed orchard were more suitable to be divided into artificial populations.Based on the characterization of genetic distinctness among samples,the highest 14(20%)genetic distinct and the most 14(20%)genetic redundant samples were identified respectively,and the 24 pairs pairwise samples with the most genetic distinctness and genetic redundant were selected as well.Those finding provides informative information for the genetic management of parents in the clone seed orchards of Masson pine in Baisha.
Keywords/Search Tags:Pinus massoniana Lamb., Genotyping-by-sequency, SNP calling, Genetic diversity, Genetic distinctness
PDF Full Text Request
Related items