Font Size: a A A

Bayesian Analysis Of Single-cell Sequencing Data

Posted on:2021-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:H M LiuFull Text:PDF
GTID:2370330611959196Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Single-cell RNA sequencing is a new generation of sequencing technology that can reveal new intercellular heterogeneity at the gene expression level in seemingly homogeneous cell populations.However,these experiments are prone to unexplainable high-level technical noise,which poses new challenges for identifying genes that are truly heterogeneously expressed in cell populations.This article mainly discusses the BASi CS(Bayesian Analysis of Single-Cell Sequencing Data)framework with Bayesian modeling as the background.Specifically,the work of this article can be roughly divided into the following two parts:The first part is based on the BASi CS framework to quantify unexplained technical noise and cell-to-cell heterogeneity.Among them,the cell-specific normalization constant is estimated as part of the model parameters.The technical variability is quantified according to the spike-in gene introduced manually,and the total variability of the count is decomposed into technical and biological components.In addition,BASi CS also provides an intuitive test standard for detecting high(or low)variable genes in the research cell population.Finally,the method was applied to the gene sequencing data of mouse embryonic stem cells,thus demonstrating the effectiveness and applicability of the method.The results show that in all genes,unexplained technical noise explains the total variability of approximately 28% of the expression counts in a typical cell,and these data strongly prove the heterogeneity between biological cells.The second part is based on the BASi CS model to correct the mean dependence of the difference test.Specifically,under the framework of BASi CS,Poisson structure model(non-regression model)and Poisson-negative binomial structure model(regression model)are established.First,the residuals of inter-cell transcriptional variability were obtained,and the residuals were not confused by average expression.Secondly,in order to evaluate whether the regression BASi CS model effectively improves the posterior inference,a large data set of CA1 pyran neurons is used,and a small data set is artificially generated by randomly sub-sampling 50-500 cells.It can be observed that both the regression and non-regression BASi CS models result in average expression estimates that are basically stable at different sample sizes and expression levels;while when the sample size is small,the non-regression BASi CS model underestimates the specific overdispersion parameters of low-expression genes(Over-dispersion parameter).To ensure the broad applicability of the method,the BASi CS model was expanded to handle datasets without spike-in genes.The method presented in this paper provides biological insights into the dynamics of expression variation between cells and highlights the synchrony of biosynthetic mechanisms in immune cells after activation.Through the above-mentioned non-regressive BASi CS model and regression BASi CS model and their applications,they will be discussed.The model established in this article provides a powerful tool for understanding the role of heterogeneity in gene expression and provides a basis for more complex downstream analysis in single-cell sequencing experiments.
Keywords/Search Tags:Single cell RNA sequencing, Bayesian, Transcriptional noise, Immune activation
PDF Full Text Request
Related items