Font Size: a A A

A Statistical Normalization Method And Differential Expression Analysis For RNA-seq Data Between Different Species

Posted on:2020-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:J D ZhuFull Text:PDF
GTID:2370330599454551Subject:Statistics
Abstract/Summary:PDF Full Text Request
As a revolutionary sequencing technology,high-throughput technology has become a tool to replace hybrid-based microarrays for biological research due to its high throughput,high precision and low cost.At present,Next Generation sequencing(NGS),which is marked by Roche 454 technology,Illumina Solexa technology,and ABI SOLID technology,has been widely used in practical biological research.High-throughput techniques bring novel tools but also statistical challenges to genomic research.Due to the difference in gene length,sequencing depth,and fragment distribution between samples,we usually cannot directly process the RNA-seq data.The existing literature provides a variety of methods for standardizing data of the same species,and using standardized data to perform differential expression analysis of genes.This article considers the issue of normalization of RNA-Seq data for different species.Identifying genes with differential expression between different species is an effective way to discover evolutionarily conserved transcriptional responses,and has a major impact on the exploration of the evolution of gene expression levels in mammalian organs and the effect of gene expression levels in medicine.Due to the total read counts,different gene numbers and gene lengths,the normalization of RNA-seq data for different species is more complex,the methods for the same species used cannot be directly applied to normalize the genes of the different species.In order to compare the RNA-seq data of different species,in this paper,we propose a scale based normalization(SCBN)method by taking into account the available knowledge of conserved orthologous genes and by using the hypothesis testing framework,and search for the optimal scaling factor by minimizing the deviation between the empirical and nominal type I errors.We compared the SCBN method with the existing normalization method(Median).Simulation studies show that the proposed method performs significantly betterthan the existing competitor in a wide range of settings.In addition,an RNA-seq dataset of human and mouse is also analyzed and it coincides with the conclusion that the proposed method outperforms the existing method.Finally,for the practical needs,we have also developed an R package named "SCBN",which is available in the Bioconductor website.
Keywords/Search Tags:RNA-seq, Hypothesis test, Normalization, Differential expression, Orthologous gene
PDF Full Text Request
Related items