Font Size: a A A

Construction Of Full-length Plant Transcript Database Based On Three-generation Transcriptome Sequencing And Rice Hybrid Data Phasing

Posted on:2021-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:J W FengFull Text:PDF
GTID:2370330611983354Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Third-generation transcriptome sequencing Iso-Seq(Isoform Sequencing)is a novel transcriptome sequencing method developed in recent years.The great advantage of third-generation sequencing over next-generation sequencing is the long length of read which enabled characterization of the integrated transcriptome without assembling.This study completed two tasks based on Iso-Seq data.One was constructing a fulllength plant transcript database,and the other was phasing rice hybrid sequencing data in combination with other omics sequencing data.The main research contents are as follows:1.Construction of PISO(Plant ISOform sequencing database),a plant full-length transcript databaseA total of 19 plant species with Iso-Seq data were used to build the plant fulllength transcript database.Considering the existence of reference genome and ploidy,three different pipelines were applied for 19 species to perform transcript identification,new gene discovery,alternative splicing(AS)event and alternative polyadenylation(APA)event identification.Based on the processed data,a plant full-length transcript database(PISO)was constructed.We have obtained 1,391,165 transcripts,50,803 new gene loci,878,057 AS events and 81,416 APA events.And we built the Transcript Browser and Alternative Splicing Search for retrieving processed data in PISO.In addition,four useful tools were developed based on full-length transcript,including function search,BLAST,Full-length Match and GBrowse.These tools facilitated users to locate full-length transcripts from annotation,sequence,and genomic location.In summary,a comprehensive database of plant full-length transcripts was constructed by collecting and processing published plant Iso-Seq data.2.Phasing of multi-omics hybrid rice sequencing dataIn this study,we developed a pipeline of hybrid sequencing data phasing based on two parental genomes.Through this pipeline,we can phase hybrid Iso-Seq,RNA-Seq and Whole Genome Bisulfite Sequencing(WGBS)data.Compared with RNA-Seq,Iso-Seq data phasing was capable of separating the whole gene structure,and producing significantly higher ratio of successful separation.Therefore,we were able to analyze the differences in alternative splicing,allele expression and DNA methylation between the parent and offspring,and the differences between offspring alleles.Meanwhile,by constructing an allele co-expression network,we studied the interaction relationship of hybrid alleles.Then we found that the differential expression between the parents and the parental allele in offspring implied a preference for Trans-regulation in different tissues and conditions.By comparing allele-specific DNA methylation,we found that CG methylation had stronger inheritance ability than CHG methylation and CHH methylation,and it was more enriched in genic regions.In summary,the transcriptome and DNA methylation changes between rice hybrids and parents were analyzed by constructing a hybrid phasing strategy based on two genomes.
Keywords/Search Tags:full-length transcripts, database, hybrid phasing, rice, allele
PDF Full Text Request
Related items