Font Size: a A A

Algorithms For RNA Alternative Splicing And Secondary Structure Analysis Based On High-throughput Sequencing Data

Posted on:2021-03-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Y ZhouFull Text:PDF
GTID:1480306542996509Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Transcriptomics is one of the most important research fields in molecular biology.In the past decade,with the advent of high-throughput sequencing technologies,unprece-dented progress has been achieved in the research of transcriptomics.Related discoveries are constantly refreshing people's knowledge of life.However,the short length and the enormous amount of sequencing reads make the subsequent data analysis impossible to conduct without the support of computers.Powerful mathematical models and efficient algorithms become the key to transcriptomics.In this thesis,two complementary types of RNA in the transcriptome are concerned,that is,coding RNA and non-coding RNA.Based on high-throughput sequencing data,we develop some novel algorithms and per-form data analysis focusing on the crucial biological processes of either type of RNA,which are alternative splicing in coding RNA and secondary structure in non-coding RNA.Alternative splicing plays a pivotal role in gene expression of eukaryote organisms.The exon-inclusion ratio is often regarded as one of the most effective measures of al-ternative splicing events.In this thesis,we propose a new method,Free PSI,which is the first alignment-free method that can estimate exon-inclusion ratios without relying on the guidance of a reference transcriptome.It develops a novel probabilistic generative model to quantify the exon-inclusion ratios at the genome-scale and applies an efficient expectation-maximization algorithm based on a divide-and-conquer strategy to solve the model.We compare Free PSI with the existing methods on simulated and real RNA-seq data in terms of both accuracy and efficiency and prove that it can achieve very good per-formance even though a reference transcriptome is not provided.Our results suggest that Free PSI may have important applications in performing alternative splicing analyses for organisms that do not have reliable reference transcriptomes.Using computational anal-yses on alternative splicing,we successfully discover that the abnormal splicing of gene ESRRG will lead to the deterioration of the disease in SF3B1-mutated prolactinomas.The function of non-coding RNA is mainly determined by its secondary structure,yet in vivo RNA secondary structures remain enigmatic.PARIS is a recently developed high-throughput sequencing-based approach that enables direct capture of in vivo RNA duplex structures.However,the special information contained in reads obstructs the integration of PARIS data with the existing tools on predicting RNA secondary structure.Here,we introduce IRIS,a method for predicting in vivo RNA secondary structure ensembles based on PARIS data.IRIS uses a Bayesian model to predict the secondary structure ensemble according to both thermodynamic principles and PARIS data.The predicted ensembles are verified by the evidence from evolutionary conservation and consistency with other experimental RNA structural data.As the first method using PARIS data to predict com-plete in vivo RNA secondary structures,IRIS enhances the application of PARIS data in in vivo RNA secondary structure prediction.We also propose the concept of RNA secondary structural domain for long-chain RNA based on PARIS data and develop an algorithm for domain partition,which opens up a new way to study the structure of long-chain RNA.
Keywords/Search Tags:RNA alternative splicing, RNA secondary structure, Algorithms in computational biology, RNA-Seq data, PARIS data
PDF Full Text Request
Related items