Font Size: a A A

Computational analysis of RNA-Seq data in the absence of a known genome

Posted on:2014-12-26Degree:Ph.DType:Thesis
University:The University of Wisconsin - MadisonCandidate:Li, BoFull Text:PDF
GTID:2453390005488720Subject:Biology
Abstract/Summary:
RNA-Seq technology has revolutionized the way we study transcriptomes. In particular, it has enabled us to investigate the transcriptomes of species that have not yet had their genomes sequenced. This thesis focuses on two computational tasks that are crucial to analyzing RNA-Seq data in the absence of a sequenced genome: transcript quantification and de novo transcriptome assembly evaluation.;For transcript quantification, RNA-Seq is considered a more accurate replacement for microarrays. However, to allow for the highest accuracy, methods for analyzing RNA-Seq data must address the challenge of handling reads that map to multiple genes or isoforms. We present RSEM, a generative statistical model of the sequencing process and associated inference methods, which tackles this challenge in a principled manner. Our results on both simulated and real data sets suggest that RSEM has superior or comparable performance to other quantification methods developed at the same time.;To facilitate the usage of our method, we implement RSEM as a robust and user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes.;Building off of RSEM, we have developed a novel probabilistic model based method, RSEM-EVAL, for evaluating de novo transcriptome assemblies from RNA-Seq data without the ground truth. Our RSEM-EVAL score has a broad range of potential applications, such as selecting assemblers, optimizing parameters for an assembler and guiding new assembler design. Results on both simulated and real data sets show that the RSEM-EVAL score correctly reflects the accuracies of the assemblies. To demonstrate its usage, we assembled the transcriptome of the regenerating axolotl limb by selecting among over 100 candidate assemblies based on their RSEM-EVAL scores.
Keywords/Search Tags:Rna-seq, RSEM, Transcriptome
Related items