Font Size: a A A

A Fast Clustering Method For Large Single-cell RNA-seq Data Based On Spectral Clustering

Posted on:2022-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y T NieFull Text:PDF
GTID:2510306746967869Subject:Basic mathematics
Abstract/Summary:PDF Full Text Request
In recent years,single-cell sequencing has been widely used in the study of cellular heterogeneity,including the identification of cell types,and the discovery of new cell types or cell states,etc.In earlier studies of cellular heterogeneity,single-cell sequencing was immature and expensive.With only a small amount of single-cell sequencing data,the researchers used biomarkers to manually identify cell types.As single-cell sequencing technology matures,the cost of single-cell sequencing has also dropped significantly,which makes it relatively easy to sequence large numbers of single cells.Therefore,in the latest studies of cellular heterogeneity,it is not practical to manually identify cell types using biomarkers after obtaining large single-cell RNAseq(sc RNA-seq)data.However,when dealing with large sc RNA-seq data,existing clustering algorithms usually consume lots of time and computational cost,and even some algorithms cannot handle the clustering problem of such data at all.To solve this problem,this paper improves the traditional spectral clustering algorithm,and proposes a fast clustering method(Uspec)for large single-cell RNA-seq data based on the approximate nearest representative metamethod and bipartite graph partitioning.It not only saves the time of large-scale single-cell data clustering,but also greatly reduces the computational cost.In addition,in order to obtain higher clustering accuracy results on small datasets,this paper proposes ensemble clustering based on Uspec.Uspec's clustering speed on gold and silver standard datasets is much faster than other classical clustering methods,and gets high clustering accuracy.At the same time,on the simulated ultra-large data,Uspec only needs about 1.7 minutes to cluster 10 million data.
Keywords/Search Tags:large scRNA-seq data, fast clustering, spectral clustering algorithm, ensemble clustering, bipartite graph
PDF Full Text Request
Related items