| The main characteristic of tumors is high heterogeneity,differentially expressed analysis between various tumors and different tumor subtypes may provide some biomarkers for early diagnosis and treatment of tumors.The development of transcriptome sequencing technology provides a powerful tool to study cell heterogeneity.Among them,single-cell transcriptome sequencing(sc RNA-seq)can quantify the gene expression at a single-cell level in tissue,but the drawback is that it lost the cell’s location in the tissue due to cell dissociation.Spatial transcriptome technology that emerged in recent years can not only characterize the gene expression profiles of different spots in tissue but also obtain the relative physical location.However,a key limitation of the current mainstream technology is the lack of single-cell resolution,resulting in a mixture of signals of multiple cells.It is critical to design suitable algorithms to integrate and analyze these data,in which clustering and deconvolution are two important algorithms for identifying the expression patterns of distinct cell types and the spatial compositions of different cell types in a tissue.This paper focuses on the efficient clustering algorithm for large sc RNA-seq data and the deconvolution method for spatial transcriptome data,mainly divided into the following aspects:First,we propose Secuer,a fast and accurate clustering method for sc RNA-seq data based on spectral clustering.By employing an anchor-based bipartite graph representation algorithm,Secuer enjoys reduced runtime and memory usage over one order of magnitude for datasets with more than 1 million cells.Meanwhile,Secuer also achieves comparable or better accuracy than competing methods in small and moderate benchmark datasets.Furthermore,we showcase that Secuer can also serve as a building block for a new consensus clustering method,Secuer-consensus,which again improves the runtime and scalability of state-of-the-art consensus clustering methods while also maintaining accuracy.Next,we introduce STBayes Deconv,a statistical model based on hierarchical Bayesian,for the deconvolution of spatial transcriptome data using annotated sc RNA-seq data as prior knowledge.The model assumes that adjacent spots should tend to show expression levels with greater similarity.STBayes Deconv uses a spatial linear mixed-effects model to characterize spatial transcriptome expression profile,combined with a Gaussian process to capture dependencies between spots.The model parameters are solved by variational inference,allowing the estimate of the spot-specific expression profiles of different cell types.By revealing mixed signals,it may significantly improve the downstream analysis including identifying spatially variable genes and determining spatial domains.We illustrate the superior performance of STBayes Deconv using simulated data. |