Font Size: a A A

Transcript Expression Analysis For Multi-conditional RNA-seq Data

Posted on:2021-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2480306479460674Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous development of high-throughput sequencing technology,RNA-Seq,or transcriptome sequencing technology,has become an important tool for transcriptome research and is widely used in the fields of biology,medicine and pharmacy.The transcript expression estimation(TEE)and differential transcript usage(DTU)analysis are the main contents of transcriptomics research,which are of great significance for the diagnosis and treatment of diseases.In TEE,the reads located on the isoforms are counted first,and then the normalization is performed to obtain the isoform expression level,which ensures that the expression levels of different isoforms are comparable under different conditions.In DTU analysis,the changes of the relative transcript abundances(ratios of isoform expression)need to be detected under different conditions,and the functional analysis of the genes is performed.Due to the read multi-mapping problem,one read may be located on multiple isoforms,making it difficult to determine the number of reads on the isoforms,and obtaining inaccurate isoform expression levels and expression ratios.The current methods for detecting DTU are divided into gene-based and exon-based methods.Gene-based methods are affected due to inaccurate calculation of isoform expression ratios.Exon-based methods are inferred by detecting DEU(Differential exon usage),but DEU and DTU are not completely equivalent.This thesis mainly focuses on these problems in the two aspects of TEE and DTU analysis.In view of the multi-conditional characteristics of current RNA-Seq experiments,this thesis proposes a method,MLDA,for TEE and the detection of DTU under multiple conditions.In the study of TEE,in order to avoid the uncertainty of read count on isoforms caused by multi-mapping problem,MLDA directly models read count on exons,and considers isoform expression ratio as random variable.After the model is solved,the read counts on the gene are assigned to each isoform according to the isoform expression ratio,and the expression level of isoforms and genes are calculated.This thesis uses the simulated dataset and the real dataset to verify the accuracy of transcript expression calculated by MLDA.Experimental results show that MLDA is more accurate than other methods.In the study of DTU analysis,Bayesian generative models of MLDA are proposed based on two assumptions,a null model and an alternative model.The likelihood ratio test is used to detect DTU under multiple conditions.This thesis uses one simulated dataset and three real datasets to verify the performance of MLDA.Experimental results show that MLDA has higher accuracy and sensitivity than other methods.In order to make it available to all users,MLDA has been developed as a software package on Github and can be downloaded for free through https://github.com/PUGEA/MLDA.
Keywords/Search Tags:RNA-Seq, transcript expression, differential transcript usage analysis, multiple conditions, likelihood ratio test
PDF Full Text Request
Related items