Font Size: a A A

Assessment Of The Performance Of RNA-Seq By Using External RNA Control Spike-Ins

Posted on:2014-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:T QingFull Text:PDF
GTID:2180330434972160Subject:Pharmacology
Abstract/Summary:PDF Full Text Request
RNA-Seq, a recently developed high-throughput cDNA sequencing technology, shows many advantages in the characterization and quantification of transcriptomes. It allows more accurate quantification of expression at the gene and transcript levels. With its single-base resolution, RNA-Seq is also capable of discovering structure changes in the transcriptome, such as SNPs, gene fusion event and splice junctions. RNA-Seq promises to be used in clinical settings as a gene-expression profiling tool; however, questions about its variability and biases remain and need to be addressed before its widespreaded application. Thus, RNA spike-in controls with known concentrations and sequence identities originally developed by the External RNA Control Consortium (ERCC) for microarray and qPCR platforms have recently been proposed for evaluating the performance of RNA-Seq platforms. The number of samples and experimental factors involved so far is limited.In this study, we report our analysis of RNA-Seq data from92ERCC controls spiked in a diverse collection of447RNA samples from eight ongoing studies involving five species (human, rat, mouse, chicken, and Schistosoma japonicum) and two mRNA enrichment protocols, i.e. poly(A) and Ribo-Zero. This rich data source allows us to explore the impact various factors on the performance of RNA-Seq.Firstly, the assessment of base calling error on the Illumina sequencing platform indicated that the first five bases of raw reads and the bases with Q value less than25are associated with high error rate. This finding helps improve the preprocessing of RNA-Seq data and the optimization of raw reads mapping pipeline.Secondly, we summarized ratio of reads that are mapped to the ERCC transcripts out of the total number of sequenced reads. In our analysis, the entire collection of datasets consisted of15,650,143,175short sequence reads,131,603,796(i.e.0.84%) of which were mapped to the92ERCC references. The overall ERCC mapping ratio of0.84%is close to the expected value of1.0%when assuming a2.0%mRNA fraction in total RNA, but showed a difference of2.8-fold across studies and4.3-fold among samples from the same study with the same tissue type. This level of fluctuation may prevent the ERCC controls from being used for cross-sample normalization in RNA-Seq. Furthermore, we found that only a tiny fraction of the reads from ERCC transcripts can be mapped to any of the genomes of the five species, suggesting that the spiked-in ERCC transcripts have no obvious impact on the detection of endogenous transcripts in the samples.Thirdly, in order to evaluate the influence of different protocols on RNA-Seq quantification, we compared the normalized raw reads of the92ERCC transcripts with their actual concentration. A good linear relationship was observed for highly expressed transcripts; however, the relationship is much lower for low expression. This abundance-dependent deviation from the truth was also reflected in the differential expression analysis, in addition, we observed striking transcript-specific difference in quantification between poly(A) and Ribo-Zero. For example, ERCC-00116showed a7.3-fold under-enrichment in poly(A) compared to Ribo-Zero. Extra care is needed in integrative analysis of multiple datasets, and technical artifacts of protocol differences should not be taken as true biological findings.Finally, we observed a positive correlation between sequencing depth and correlation coefficient of ERCC expression profiles.In summary, we evaluated the quality of RNA-Seq technology in expression quantification and base-level sequencing error with the ERCC spike-in controls. The data presented here will contribute to the reliability assessment of RNA-Seq technology, the improvement of data analysis piplelines, and eventually the improvement of the effective application of RNA-Seq in clinical practice.
Keywords/Search Tags:RNA-Seq, External RNA Control Consortium (ERCC), MAQC/SEQC, mRNAenrichment protocol, Quality control, Reproducibility, Quantification bias, Poly(A) versusRibo-Zero
PDF Full Text Request
Related items