| Alternative splicing(AS)and alternative polyadenylation(APA)are widespread post-transcriptional modifications in most eukaryotes,which play key roles in the regulation of gene expression.AS and APA greatly promote the diversity and complexity of the transcriptome and proteome,which affect the stability of mRNA,translation,nuclear transport,the localization of cell and protein during various cellular activities.Massive high-throughput transcriptome data provides an opportunity to study the regulation mechanism of AS and APA under different conditions.Genomic studies have been conducted to study the dynamic of AS and APA,identify differential expression and usage AS and APA during cell growth,explore the influence of posttranscriptional regulatory mechanisms on biological functional pathways,and characterize the impact of AS/APA on the regulation of gene expression and phenotype.However,current researches on AS or APA are mostly based on a reference genome,and there is no available method can be used to identify and classify AS or APA without using a reference genome,especially for non-model organisms or species with poor genetic resources.What’s more,there is no convenient or unified platform for storage and analysis of AS/APA data.Different researchers usually use different sequencing protocols,data processing workflows,bioinformatics tools to generate data in diverse formats,which hinders the integration and comparison of APA results from multiple sources.We provided a comprehensive transcriptome profiling in Spartina alterniflora based on pacific biosciences single-molecule real-time sequencing(PacBio SMRT),RNA-seq and polyadenylation sequencing(PAS-seq)data without a reference genome.AS and APA events were identified,quantified and classified.The dynamics of AS and APA under different conditions were explored to study the regulatory mechanism and function of AS and APA.Meanwhile,we developed two R packages with rich functions,named AStrap and movAPA,to analyze AS and APA at the whole genome level.In addition,we presented two databases called SAPacBio and PlantAPAdb based on a large volume of transcriptome data from high-throughput sequencing to provide highquality AS and APA profiles in plants.These databases contain accurate coordinates of AS/APA sites,the genome annotations,expression profiles of AS/APA events,related sequences and signals,and information about dynamic regulation and evolutionary conservation across organisms.The main research contents are as follows:1.Developed the R package AStrap for the identification AS events without a reference genomeWe developed a de novo approach called AStrap to identify and classify AS by a machine-learning model with more than 500 assembled features using full-length transcriptome data.Multiple performance measures were conducted to evaluate performance ofAStrap using collected AS events in rice and human and SMRT datasets from Amborella trichopoda.Results show that AStrap has high accuracy and excellent performance for predicting AS types,with an overall accuracy of 0.87 and an average AUC value higher than 0.9.Compared with Liu’s method,AStrap can accurately detect more AS events,where 98.85%of the predicted AS events are consistent with the gold standard.We evaluated the impact of different parameters,sample sizes and training models on the performance of AStrap,and the results demonstrates that AStrap is robust and highly flexible.2.Developed the R package movAPA for the genome-wide analysis of APA dynamicsA large number of studies use different processes for APA analysis based on 3’seq or RNA-seq from various experimental samples,which causes the challenges of data integration and comparison.However,there is currently a lack of flexible,easy-to use and efficient APA analysis tool.We developed a R package called movAPA to analyze APA dynamics at the genome-wide level,which can be used to study differential APA expression and usage across different tissues,stages and cell types.movAPA integrates a wealth of functions for data processing,genome annotations,statistical analysis,signal identification,detection of 3’UTR shortening/lengthening between different conditions and APA site switching involving non-3’UTR regions,and visualization of dynamic APA in diverse ways.Based on poly(A)site datasets from rice tissues and mouse sperm cells,results show that movAPA can effectively characterize the dynamic regulation of APA at the genome-wide level with high scalability and flexibility,and can be applied to delineate tissue specificity and cell heterogeneity.3.Revealed dynamics of AS and APA under salinity stress in Spartina alternifloraSpartina alterniflora is an invasivehalophyte that can survive in high-salt environment.However,it is difficult to explore the salt tolerance of Spartina alterniflora at the molecular level due to the lack of a high-quality reference genome.The developed AStrap and movAPA were employed to study dynamic responses of AS and APA under salt stress and to reveal functional roles of AS and APA in the regulation mechanism of high-salt tolerance.We found that a lot of AS and APA are differentially expressed(DE),and the number of isoforms and DE AS and APA increases along with the salinity gradient.Interestingly,3’UTR lengthening events are significantly upon high-salt stress,especially for mRNA transcripts encoding transporters.Moreover,several transcription factors with poly(A)sites have the same expression patternsdistal poly(A)sites are preferred under high-salt stress.In addition,experimental validation of collaborators shows the expression of proteins significantly increases along with 3’UTR lengthening,indicating 3’UTR switching plays a fundamental role in salt response regulation and may be a potential salt tolerance mechanism.4.Established AS and APA databases in plants for data sharing and miningBased on transcriptome data collected from various sources,we built two AS and APA databases with rich information and user-friendly functions.In order to understand salt tolerance mechanisms of AS and APA,we present a database named SAPacBio,which provides a high-quality transcriptome and profiles AS and APA in Spartina alterniflora under different salinity stress conditions.Functions about data sharing,querying and visualization are involved in SAPacBio,which can contribute to the research of non-model organisms,and provide inspirations for studying regulation mechanism of high-salt tolerance of halophytes.We also present a database named PlantAPAdb for providing accurate coordinates,expression profiles and dynamics of APA in plants.PlantAPAdb catalogs the most comprehensive APA sites in plants,containing six organisms.User can download APA data according to different genome regions,and search database by inputting gene name or pathway descriptions.The returned information can be visualized via various figures and the JBrowse genome browser.In particular,to help better understand the underlying mechanism of 3’UTR switching,PlantAPAdb also provides information of 3’UTR shortening/lengthening between different conditions.In addition,APA conservations across different organisms are accessible in PlantAPAdb,providing insight for APA evolution during development of species.In summary,AStrap can be used to accurately identify and classify AS without using a reference genome,and movAPA provides a unified,flexible and easy-to-operate platform for comprehensive APA analysis.AStrap and movAPA were applied on Spartina alterniflora to study dynamics of AS and APA under salt stress and explore the high-salt tolerance mechanisms and functions of AS and APA.Moreover,the userfriendly and rich functions databases SAPacBio and could strengthen data sharing,provide abundant sources of AS and APA for scientific researchers,and help find potential driven genes or sites.The in-depth study of the dynamic regulatory mechanism of AS and APA from the whole genome will help to explore the functional diversity and interrelationship of AS/APA transcripts from the perspective of systems biology,which has a higher resolution than traditional gene expression analysis.The developed tools and platforms can provide rich resources for the decryption of biological genetic information. |