Font Size: a A A

Research On Differential Expression And Clustering Analyses Of RNA-Seq Data

Posted on:2019-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:X F ShiFull Text:PDF
GTID:2428330596450376Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,low-cost,high-throughput RNA-Seq technology has been widely used and obtained a large amount of read data,which provide possibilities for gene expression and subsequent analys is.The aim of RNA-Seq data is to obtain biological meaningful analytical results through gene expression calculation and the subsequent analysis,and to provide important help for the inference of biological conclusion.Therefore,the subsequent analysis of RNA-Seq data is of vital importance.Differential expression(DE)and clustering Analyses of genes,as the two important tasks of subsequent gene expression analys is,are important tools to dis cover unknown gene functions.Differential expression analysis explores unknown functions of genes by detecting differentially expressed genes under different environmental conditions.Clustering analys is classifies genes according to the similar expression profiles and discovers the unknown gene functions.Because of the non-uniform distribution of read counts,the read counts are usually modeled as a negative binomial distribution.In the subsequent analysis of RNA-Seq data,some existing algorithms directly model the read counts without fully considering the various kinds of noise presented in the experiment and the uncertainty of the measurement of gene expression,or they do not take sufficient account of the uncertainty of biological replicates.The main purpose of this thesis is to address these deficiencies in differential expression and clustering analyses.In the study of differential expression analysis,in addition to the above shortcomings,many current RNA-Seq experiments involve multiple conditions,and most of the differential expression analys is algorithms consider only two experimental conditions,and multi-conditional differential expression analysis is still under study.This thesis proposes PUseqDE(propagation uncertainty into multi-condition RNA-Seq Differential Expression analysis)method for differential expression analys is.The PGSeq model was first used to obtain the gene expression and the associated technical uncertainty of RNA-Seq data.Then under hypothesis testing,two Bayesian hierarchical models,a null model and an alternative model,were designed.Finally,it used the likelihood ratio test for differential expression analysis.PUseqDE was validated by using simulated data sets and two real data sets.Experimental results show that this method has higher sensitivity and accuracy than other methods.In the clustering analysis,the current methods also have the above shortcomings.Moreover,some clustering algorithms cannot determine the optimal cluster number,or cannot fully consider the uncertainty of the cluster center.This thesis proposes the PUseqClust(propagating uncertainty into RNA-Seq clustering)framework for the clustering of RNA-seq data.This framework first used PGSeq to model the stochastic process of read generation.The Laplace method was next used to calculate the correlation between expression under various conditions and replicates.This helped to obtain accurate uncertainty of expression estimation.Finally,the method adopted the Student's t mixture model to perform gene expression clustering.Results show that the proposed method obtains more biologically relevant clustering results.
Keywords/Search Tags:RNA-Seq, gene expression level, uncertainty, differential expression analysis, clustering analysis
PDF Full Text Request
Related items