Font Size: a A A

Research On Data Modeling And Simulation Of Single-Cell RNA Sequencing

Posted on:2022-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:G M WangFull Text:PDF
GTID:2480306605486324Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years,single-cell RNA sequencing technology(scRNA-seq)has developed rapidly,and related bioinformatics methods have also emerged.These processing and analysis methods can analyze the gene expression heterogeneity of a cell population with single-cell resolution,involving cell typing,differentially expressed gene detection,and trajectory inference.However,before practical application of these bioinformatics methods,it is essential to systematically evaluate them.Compared with method evaluation based on actual gold standard data,simulation methods provide a comprehensive and flexible way to effectively evaluate these bioinformatics methods.Although a variety of scRNA-seq simulation methods have been proposed in recent years,their performance still needs to be improved.In order to facilitate the evaluation of various scRNA-seq bioinformatics methods,this paper proposes a set of simulation methods based on scRNA-seq counting matrix generation model.This method constructs a negative binomial(NB)or negative binomial zero inflation(NBZI)distribution model,estimates parameters from real scRNA-seq data,and finally generates a gene expression count matrix.The method includes three basic working modes:working mode 1 simulates the mean value of gene expression and size factor through a mixed Gaussian model,and simulating the biological coefficient of variation through an inverse chi-square distribution,this method is the most flexible;working mode 2 cell number and library size Adjustable,but the number of genes is the same as the actual data;working mode 3 adds gene co-expression information that retains the actual data through the Copula framework.The experimental results show that,compared with other simulation methods,the simulation data generated by this method shows a better fitting effect on a variety of data of homogeneity,heterogeneity,UMI and non-UMI.After estimating parameters from homogenous data,this method can be further used to extend simulations,such as simulating cell groups,batches,and differentiation pathways.This simulation method supports many types of evaluation and analysis methods,including unsupervised cell clustering,differentially expressed gene prediction,trajectory inference,batch correction and imputation.
Keywords/Search Tags:scRNA-seq, Evaluate, Simulation, Negative binomial distribution, Zero inflation, Gene co-expression
PDF Full Text Request
Related items