Font Size: a A A

Improved Trancriptome Expression Analysis For RNA-Seq Data

Posted on:2016-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:X X ShiFull Text:PDF
GTID:2180330479476579Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The high-throughput sequencing technology, RNA-Seq, has been widely used to quantify gene and isoform expression in the study of trancriptome in recent years. Compared with traditional methods, RNA-Seq has many advantages, such as higher signal-to-noise ratio, higher resolution and less sample requirement. However, the analysis of RNA-Seq data still faces serious challenges, such as ambiguous mapping of reads to reference transcriptome and non-uniformity of read distribution along the reference transcriptome. We propose a latent variable model, NLDMseq, to estimate gene and isoform expression. Our method adopts latent variables to model the unobserved isoform, from which reads originate. According to the annotation file and mapping results, we get the input of the model. The model is worked out via a varialtional EM algorithm. The isoform- and exon specific reads sequencing biases are modeled to accout for the non-uniformity of read distribution. Besides, by introducing ’pseudo-exon’ and ’pseudo-transcript’, the conjunction reads and noise reads gain proper treatments, reducing the errors caused by noise reads and spliced junctions. We employ three real datasets and a simulation dataset to verify the performance of our method in terms of accuracy in the calculation of gene and isoform expression, and compare the results with popular alternatives. Results show that NLDMseq obtains more accurate isoform and gene expression measurements than other approaches, and is computationally faster than other competitors. Finally, the proposed method is applied to the detection of differential expression to show its usefulness in the downstream analysis.We have implemented NLDMseq as a Python/C software which is available at Git Hub for public use of our approach.
Keywords/Search Tags:RNA-Seq, transcrptome expression, multi-mapping, generative model, non-uniformity
PDF Full Text Request
Related items