Font Size: a A A

Research On Switchgrass Gene Annotation And Regulatory Network In Response To Salt And Drought Stresses Based On Transcriptomic Data

Posted on:2019-11-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:C M ZuoFull Text:PDF
GTID:1360330572450440Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Switchgrass(Panicum virgatum L.)is a perennial grass native to North America and considered a major biofuel crop for cellulosic ethanol production,because of its strong adaptability and high biomass production.Previous studies have predicted that over the next 40 years,the world's population continues to grow,and agricultural production should be increase by some 60% to meet the requirement,however,a quarter of all aricultural land has already suffered degradation.To reduce the competitive of land relative to the crops,and improve the economic efficiency,switchgrass is primarily grown in marginal lands,which tend to be affected by abiotic stress,such as drought and salt stresses.Recent studies suggest that these two stresses are the major factors that may limit the biofuel production.Hence,it represents an important task to understand how plants adap to these two stresses.As of now,only some specific physiological,morphological,metabolic data and proteins are known to be related to the stress tolerance of switchgrass,and the lowland ecotype tend to be more tolerant to dought and salt stresses than the upland ecotype.However,what specifically make lowland ecotype more tolerant to these stresses than upland ecotype remains to be elucidated.Moreover,gene-centric approaches can't clarify the complex relationship between metabolism and regulation of plant responding to stress.Considering the above problems,two gene regulatory networks of switchgrass in response to drought and salt stresses were constructed and analyzed,based on the time course gene expression data of two ecotyopes of switchgrass.This study could provide new information about the complex regulatory mechanisms of switchgrass under these stresses,or help to build more stress tolerant plant.In addition,Alternative Splicing(AS)has recently been implicated as playing a major role in these processes of post-translational regulation during stresses,including disease,heat,drought and others,hence,this paper hopes to do the further study about the role of AS in plant drought-and salt-tolerance.As of now,the switchgrass genome and gene structure annotation have the following problems: partial assembly and numerous alternative splicing AS transcripts are yet to be uncovered.A key challenge lies in reconstruction of genome and transcripts from short reads.Moreover,short-reads assembling generate low-quality transcripts,and leading to incorrect annotations.The emergence of PacBio single molecular real-time technology promise more accurate elucidation of full-length(FL)transcripts of all organisms,especially heterozygous polyploids like switchgrass.Specifically,the PacBio technique eliminates the need for sequence assembly because of its ability to sequence reads up to 50 kbp long,hence providing direct evidence for splicing transcripts of one gene.In addition,this technology could be used to improve the accuracy of existing gene models.However,compared to secondary sequencing technology,this technology has its own limitations: higher sequencing error rates(~15%)and lower throughput.Although researchers have designed some algorithms and softwares to correct PacBio sequencing data,there are a waste of sequencing data in its standard data analysis pipeline,specifically,no further correction and analysis process for the most of non-FL transcripts.Given the above problems,a statistics model and one data analysis method have been designed to process the switchgrass PacBio data to identify all transcripts(including non-FL transcripts)and new information used to improve switchgrass genome assembly and gene annotation.This study could provide a foundation for further molecular biology of switchgrass.The detailed contents of this research are given as follows:(1)Anti-Drought and Salt Stresses Analyses Based on Second-Generation Sequencing Data2,441 and 2,429 key stress-responding genes,including TFs and target genes(TGs),were identified by comparing the treated samples with the matching untreated samples,for drought and salt stress type,respectively;the co-expression method was used to calculate the initial regulatory relationship between TFs and TGs,and the network component analysis method was to process initial gene regulatory network and gene expression data to reconstruct TF activity(TFA)and control strengths(CSs),and predict the final gene regulatory network;based on two independent data sources regarding the TF-TG relationships in model species as such relatioships tend to be conserved across related plants,and TGs regulated by the same TF should have highly related or similar functions,we found more than 40% of regulatory relationship were validated;by combining the gene regulatory network and the temporal Markov model,we have constructed the dynamic regulatory network for each ecotype under each stress type,and derived a system-level understanding about how different metabolic processes are linked to regulatory relationships when adapting to stress.Comparative analyses of the commonalities and differences between the network models of the two ecotypes were used to infer that the lowland ecotype and upland ecotype have different regulatory systems regarding regulating cell osmotic pressure,maintaining intracellular ion balance and redox balance.In addition,the experimental results were supported by relevant metabolic data and literature.(2)Identification and Analyses of Transcript Based on PacBio Sequencing DataThe PacBio standard data analysis method was used to process switchgrass PacBio data set to identify 265,773 accurate FL transcripts,and based on the genomic mapping of these FL transcripts,we found 6,195 genomic regions are possibly mis-assembled;we have designed a statics model and one data analysis pipeline to correct non-FL transcripts,based on known base-pair mismatch error rate characteristics of the PacBio raw sequencing data.After applying it to the non-FL data,we identified 657,991 accurate non-FL transcripts.Based on the genomic mapping of these non-FL transcript,we found 2,549 genomic regions are possiblely mis-assembled;we identified 105,419 unique transcripts from the above accurate FL and non-FL transcripts,among them,~77,000 and ~37,000 are new and non-FL one,respectively.Based on the comparison between these transcripts and switchgrass gene annotation,we found that 16,640 genes are possibly mis-annotated;60,096 AS transcripts are predicted from these PacBio transcripts.In addition,the above experiment results were supported by Illumina and Sanger sequencing data.This study aims to explore the role of transcriptional and post-transcriptional regulation mechanisms of switchgrass in response to drought and salt stresses,based on its transcriptomic data.In future work,we will improve switchgrass genome and gene annotation,based on the information identified in the second work,and then,explore the role of AS events of switchgrass responding to these two stress,based on new genomic and gene structure annotation.
Keywords/Search Tags:switchgrass, transcriptomic analysis, drought-and salt-tolerance, gene regulatory network, PacBio sequencing, alternative splcing, genome assembly and gene annotation
PDF Full Text Request
Related items