Font Size: a A A

Transcriptome Assembly Of Mouse Early Embryos Based On The Combination Of Third And Next Generation Sequencing

Posted on:2022-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:J YuanFull Text:PDF
GTID:2480306566991909Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Objective: To demonstrate the advantages of the combination of third-generation sequencing and next-generation sequencing in studying the transcriptome of early mouse embryo compared with the traditional next-generation sequencing technology.Based on the advantages,we can identify previously unannotated genes,isoforms,alternative splicing and allele-specific transcripts and splicing,and provide a more comprehensive and high-resolution transcriptome annotation of early mouse embryos.At the same time,we will explore the dynamic changes of the newly annotated resources in seven stages of early embryos,and study the related functional elements and regulatory mechanisms behind these changes,to lay a foundation for further exploring the mechanism of early mouse embryo development.Methods: We collected samples from seven stages of mouse early embryos(sperm,oocyte,1-cell,2-cell,4-cell,8-cell and blastocyst)and sequenced them with the next-generation and third-generation RNA-seq respectively,and then analyzed these two datasets with bioinformatics methods.We used Iso-seq3 of Pac Bio platform to identify full-length transcripts on third-generations sequencing data,and compared them with GENCODE annotation using Cuffcompare to obtain previously unannotated novel genes and isoforms.In order to analyze the homology of novel coding transcripts,we compared them to the database using blastp and hummer.Moreover,big Wig Average Over Bed was used to calculate the phylo P and phast Cons scores of novel non-coding transcripts.Salmon and other tools were used to quantify the long-read transcripts with short-read sequencing data.In addition,we used PCR amplification and Sanger sequencing to verify the novel genes and isoforms.We used SUPPA2 to identify alternative splicing events and differential splicing events on two datasets.Allele-specific transcripts were identified using SNPsplit,GMAP and STAR,etc.With the combination of a large number of bioinformatics tools and in our custom python or R scripts.We complete the analysis of short-read and long-read sequencing data of early mouse embryos.Content and results: We revealed the complexity and novelty of mouse early embryonic transcriptome with the combination of next-generation and third-generation sequencing,and identified 2,280 novel transcripts from previously unannotated loci and6,289 novel splicing isoforms from previously annotated genes.Subsequently,the annotated and novel full-length transcripts were quantified by short-read sequencing data,and the dynamic expression trend of full-length transcripts in seven stages was also described.We found that both annotated and novel transcripts were highly expressed in early embryos and they had the similar expression pattern.We analyzed the homology of novel protein coding transcripts,and found that most of the transcripts had homologous products in the database.We also analyzed the conservation of the novel non-coding transcripts,which showed that a large number of transcripts were highly conserved among species.Then,the novel transcripts were verified by published H3K4me3 data and CAGE data,and a high-confidence novel transcript dataset was obtained.We also compared the potential of using only the short-read data and the combination of short-read and long-read data in the identification of novel transcripts and the differences in the quantification of transcripts.The results showed that the combination of short-read data and long-read data was significantly better than the short-read data only.In addition,since the sequencing depth of long-read data is lower than short-read data,short-read data can also identify some transcripts that can not be identified by long-read data.Then,we used the short-read and long-read sequencing data to identify alternative splicing and differential splicing events.The results illustrated that compared with the short-read sequencing,the long-read sequencing data could identify a higher proportion of novel alternative splicing events.Then the dynamic changes of alternative splicing and differential alternative splicing in different stages of early embryo were discussed.It was found that the changes of seven types of splicing and differential splicing events were very dramatic,which led to the rapid changes of transcripts.We identified a novel isoform of Kdm4 dl and a novel noncoding gene designated XLOC?004958.With the help of experiments,we found that the novel isoform had a modified m RNA reading frame and the depletion of Kdm4 dl or XLOC?004958 led to abnormal blastocyst development.By comparing the allele-specific transcripts identified by the next-generation sequencing data and the third-generation sequencing data,it was found that 50-94% of these allele-specific transcripts in the seven stages could only be identified by the third-generation data,but not by the next-generation data.We also identified allele-specific alternative splicing events and differential alternative splicing events based on allele-specific transcripts,there were an average of 650 allele-specific splicing events and 26 differential splicing events identified in each stage examined.Conclusion: Our analysis shows that compared with the next-generation sequencing data,the combination of next-generation and third-generation can identify more novel transcripts,alternative splicing events,allele-specific transcripts and allele-specific splicing events,and it can more accurately quantify the transcriptome.Using the advantages of the combination of next-generation and third-generation,we provide a high-resolution,more precise transcriptome with allele-specific transcripts and splicing events,which lays a foundation for further exploring the mechanism of early mouse embryo development.In addition,we have observed dramatic dynamic changes in the transcriptome of early embryos,which can provide a potential direction for further study the regulatory mechanism of early embryos.
Keywords/Search Tags:Third-generation sequencing technology, Early embryo, Isoform, Alternative splicing, Allele-specific
PDF Full Text Request
Related items