As an important agricultural animal and a model of future medical research,pigs have important research significance.The genome and transcriptome sequencing is the basic work of life science.The reference genome of pig has already released,but the complete annotation needs to be improved.The full-length(FL)transcriptome can greatly promote the perfection of genome annotation and provide more compacted bases to the research and application of gene function and expression regulation,transcriptome network etc..Using third generation real-time single-molecule(SMRT)sequencing technology of PacBio in this study,38 tissues of pig performed by FL transcriptome sequencing and 8 tissues carried out using Illumina transcriptome sequencing,revealing the complexity of the pig transcriptome,provided a more comprehensive FL reference transcriptome of pig for post genome research.The main results are as follows:1.For SMRT sequencing,the average read length was 12 kb.In total,517,462 full length sequences including 389,781 high quality FLNC sequences were obtained.The comprehensive PacBio isoform reference transcriptome of pig were constructed based upon these FLNC sequences.2.Annotation upon the high quality FLNC sequences,39940 loci and 77,075 isoforms were annotated,up to 96.65% of the genes that produced transcripts with length more than 1kb.In these annotated loci,it included 25,481(63.81%)single exon loci,14,453(36.19%)multiple exon loci.The 77,075 isoforms contained 29,992(38.91%)single exon isoforms and 47,083(61.09%)multi exon isoforms.In the case of absolute dominance of single exon,a number of muti exon isoforms that were nearly 2 times more than single exon isoforms were produced.It indicated that FLNC sequence was helpful for the more comprehensive detection of the transcript structure.The result also showed that FL transcripts of pig hardly kept a length less than 1kb.3.In the 39,940 loci,7,053(17.66%)loci had underwent alternative splicing(AS).There were 97,727 AS events,corresponding to the 2,637 kinds of splicing modes.Alternative splicing events of the five classific splicing modes accounted for only21.42%(20,930).Iso-seq data detected more than 92,000 novel AS envents and 2,100 new AS modes,which indicated that a large number of AS events had not been annotated in the published porcine reference genome.The PacBio isoform could fill gaps and improve the availability and accuracy of the reference genome.4.In the Iso-seq data,a great deal of AS events(23,440 AS events and 555 AS modes)were detected on chromosome 14 of pigs.Compared with other chromosomes,there was absolute advantage on chromosome 14.Of the esimilar species,horse,cattle,sheep,human,mouse,there was a high correlation between the number of genes and AS events(r = 0.83–0.94).However,the correlation coefficient was very low(r = 0.35)in the porcine reference annotation.In pig,the number of AS events did not increase significantly with the increased number of genes on the chromosome,which was influenced slightly by the number of genes.The regulation mechanism of AS was more complex,showing a unique species specificity in pig.5.Compared with the reference annotation,26,881 novel candidate loci were found,including 70,856 novel isoforms.417 high confidence novel genes were identificated by mapping the candidates to NR,KOG,KO and GO databases.The new genes mainly involved in system development,metabolism,transcription,translation,signal transduction,cell motility and metabolism pathways.The novel genes mainly involved in the cellular process,metabolic process,biological regulation,stimulation and development process.6.269 fusion genes and fusion transcripts were identified in the fusion analyses,in which Chr3,Chr6,Chr7 were more likely to occur fusion events,appearing the most fusion isoforms on the Chr7.The fusion genes were identified from 622 fusion candidate genes,which were involved in these biological processes such as phagocytosis,cell metabolism,immunity,and drug metabolism.The function interaction network analyses showed that the fusion events hardly happened in those genes with functional interaction between each other,and rarely occurred in those genes from the same family,but dominant genes controlling the disease easily triggered fusion events.7.After the two generation high-throughput sequencing for 8 tissues,using PacBio isoform as the reference notes,in 8 tissues from 1 day and 210 days in 2 periods,25,018 loci expressed simultaneously in 8 tissues.These genes could be divided into40 expression trend patterns,of which 16 patterns were significant.These genes exhibited diversities between the different tissues and periods.Tissue-specific analyses of 47,083 multi exon transcripts showed that the gene could play different functions in different tissues by the same or the unique isoform.Although similar basic biological functions of these splice isoforms played,specific isoforms could play various roles in different tissues through different signaling pathways.8.Based on 77,075 PacBio transcripts,8,838 lncRNAs were identificated by constructed the PLEK model,as well as 4,394 new lincRNAs with an average length of 2kb.Based on the Illumina data,expression analyses of lncRNA and non-lncRNA found that they had tissue-specific expression,especially in subcutaneous fat of back and endometria from adult period,and in endometrium from 1 day period.Meanwhile,the expression of lncRNA was found to be lower than that of non-lncRNA,and the expression of lncRNA in multi exon was higher than that in single exon.Methylation analysis found that non-lncRNA showed higher CG methylation levels in gene ontology regions,but the lncRNA had a higher level of methylation in the upstream region of the transcription initiation site and the downstream region of transcription termination site.9.Exploring the effects of methylation on AS,near the splice sites,changes of cytosine contents dipped sharply and then rose steeply in 3bp base on the sense and 2bp on the antisense.These changes caused the change of methylation level,and thereby may affect the occurrence of AS events.At the same times,CG methylation in the promoter region of the gene could inhibit the occurrence of AS events. |