Font Size: a A A

Rna-Seq Data Analysis Based On Probabilistic Model

Posted on:2013-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z L ZhaoFull Text:PDF
GTID:2230330362970899Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Estimating gene and isoform expression levels is a fundamental analysis of transcriptomics. Theaccuracy of the downstream data analysis depends on whether we can accurately measure gene andisoform expression levels. RNA-Seq uses next-generation high-throughput sequencing technologiesto study the transcriptome, and is considered as a revolutionary tool for transcriptomics. RNA-Seqprovides a ’digital’ method to measure expression levels. However some features of RNA-Seq datawill bring challenges to accurately measure gene and isoform expression levels.First of all, the length of RNA-Seq reads is usually short, thus reads can not cover the wholetranscript. Next, alternative splicing, existing in most of eucaryote, leads to multimapping, when readsare mapped to transcript reference sequences. Moreover, because of experimental operation bias, genestructure characteristics and sequencing errors, reads are non-uniformly distributed across referencesequence. An exact model is therefore in need to simulate the features of RNA-Seq data and computeaccurate expression levels.If reads are mapped to exons, according to the structural similarity between RNA-Seq data andtext data, i.e. exons are analog to words, isoforms to topics, all observed exons of a gene to adocument, this thesis takes advantage of a generative probabilistic model LDA for text data to analyseRNA-Seq data. The proposed probabilistic model NU-LDA measures gene and isoform expressionlevels under the assumption that reads are non-uniformly distributed across the reference sequences..Compared with an alternative model, rSeq, on real experimental data, the proposed model getsmore accurate results.
Keywords/Search Tags:RNA-Seq, gene expression, multi-mapping, LDA, probabilistic model
PDF Full Text Request
Related items