| Pinus massoniana Lamb.belongs to the genus Pinus of the Pinaceae family,and it is an important coniferous tree species for afforestation in South China.Genotyping-by-sequencing has become an economical and convenient approach to develop high-throughput SNP markers in forest trees,but the high proportion of missing SNPs caused by non-uniform digestion of restriction enzymes in the genome has severely limited the application of SNP through GBS in masson pine molecular breeding.Selecting appropriate restriction enzymes(REs)to optimize GBS library construction and increase the coverage of fragments on the genome will help the improvement of quality identified SNP.In this study,the in silico analysis of IgCoverage on homologous genome of masson pine was carried out,and 10 sets of restriction enzyme were selected to conduct the GBS application for SNP calling using six masson pines individuals.The assembled de novo REFs based on the 10 REs GBS data were integrated to construct a set of mixed enzyme digestion representative genome resource(mixREF)of masson pine used as reference for GBS-SNP calling.At the same time,the applicability of the mixREF in the quality improvement of SNP identification in GBS application of masson pine was evaluated.The study provided a fundamental research foundation and technical strategy on the optimization of REs in GBS application for quality improvement of high-throughput SNP calling in masson pine.The main results are as follows:1)The IgCoverage program was used to conduct in silico analysis on the homologous genome of masson pine and different REs combinations were selected to evaluate the efficiency of GBS application in masson pine.Based on the in silico analysis of IgCoverage results using Pinus tabuliformis and Pinus taeda genome,five sets of single-REs(NlaⅢ-65.20%,RsaI-26.82%,ApeKI-11.54%,EcoRI-1.86%,PastI-0.57%)and five sets of RE-pairs(AluIDpnI-33.76%,TaqIMseI-11.52%,BfaIHhaI-17.64%,SaIISphI0.65%,EcoRVScaI-0.33%)with different levels of IgC values(high,medium and low)were selected for GBS application using six masson pine samples.The results showed that the largest number of Contigs(3 171 665)assembled in de novo REF was obtained by ApeKI with medium IgC level in silico analysis,and the largest number of reliable SNPs(3213)were obtained after SNP quality control screening among the five sets of single-REs.The largest number of de novo Contigs(2 180 671)assembled in de novo REF was obtained by SalISphI,and the largest number of reliable SNPs(2326)were obtained after SNP quality control screening among the five sets of pair-REs.The 10 sets of de novo REFs were blasted against Pinus tabulaeformis genome and the proportion of Contigs with identity larger than 95%were ranged from 50.10%-59.50%.The coverages on the genome of Pinus tabulaeformis were between 1.57%-9.12%.The highest coverage(9.12%)was obtained with ApeKI among five single-REs,and the highest coverage(6.93%)was obtained in SalISphI among five RE-pairs.2)A number of 10 sets of SNPs were obtained based on the 10 de novo REFs contigs in six masson pine GBS data.A serial of quality control process was carried out to improve the quality of obtained SNPs,such as the removal of repetitive loci,the range of missing proportion,MAF,functional annotation screening,and 35 bp interval distance filtering.More stable SNPs were obtained in the two sets of REs,named ApeKI and TaqIMseI,with one or two SNPs distributed within a contig,comparing to the status with more SNPs were distributed in a contigs in other sets of REs in SNP calling through GBS in the six samples.The largest number of reliable SNPs(3957)was retained in the ApeKI among the five sets of single-REs,accounting for 13.94%of the initial number of identified SNPs.The largest number of reliable SNPs(2326)was retained in the SalISphI among the five sets of RE-pair,accounting for 17.61%of the initial number of identified SNPs.Meanwhile,the downloaded PacBio sequenced mRNA Transcripts of masson pine(TransREF)was used as a reference to identify 10 sets of SNPs in the same six samples GBS data sequenced with 10 sets of REs.The same largest number of SNPs were also obtained based on the three enzymes of ApeKI,SalISphI and TaqIMseI using TransREF as reference for SNP calling.Thus,the above three sets of REs would be recommended as the preferred REs in the construction of GBS sequencing library in GBS application of Masson pine.3)The SNP-associated-contigs in the 10 sets of de novo REFs obtained by 10 sets of REs in GBS data were integrated,and a set of representative genomic resources of masson pine(mixREF)was developed after removing those redundant contigs.There were 103 869 contigs in the representative genomic resources of masson pine,with the quality evaluation of 708 in N50 and 45.19%in the GC content,and the annotation rates in Nr,KEGG and GO databases were 24.75%,17.85%and 3.64%,respectively.An increased number of SNP were obtained based on the mixREF of masson pine for 10 sets of GBS data in six samples,compared to the SNPs obtained from the de novo REFs obtained by the 10 sets of Res treatments,respectively.The two largest sets of SNPs were obtained in the two REs,named TaqIMseI and SalISphI,with the proportion of 156.00%and 146.00%,respectively.More reliable SNPs were obtained in the 36 masson pine GBS data based on TaqIMseI set based on the mixREF for SNP calling,compared to the other two sets of REFs,named de novo REF and TransREF,in which more reliable and informative background information was revealed in the PCoA plots of 36 masson pine.Thus,the representative genomic resources of masson pine would be recommended for more reliable and higher qualified SNP obtained in the application of GBS in masson pine.In summary,this study provided a feasible technical strategy on the quality improvement of SNP calling based on the optimization of REs for high-throughput SNP development in the GBS application of masson pine. |