| Single-cell sequencing(scRNA-seq)technology extracts transcriptomic information at single-cell resolution,distinguishes heterogeneity between cells,and can help us to deeply explore the properties,fate and functional structure of cells.The clustering of cell,as a common analytical tool,plays an increasingly powerful role in single-cell studies of complex organ tissues and the diagnosis and treatment of clinical diseases by mining similarities between cells and thus differentiating cell types.Therefore,achieving accurate clustering of single-cell data is of great research significance in the field of bioinformatics.Tracing back the previous scRNA-seq clustering algorithms,it can be observed that most existing scRNA-seq clustering algorithms perform key operations such as data preprocessing and dimensionality reduction that are fixed and singular for different datasets.However,it is found that the sensitivity of different datasets to preprocessing and dimensionality reduction methods varies significantly.Based on these considerations,we propose a novel single-cell clustering algorithm SCM based on scoring consensus matrix.The work and innovations in this paper are based on two main points:1.Optimize preprocessing and dimensionality reduction methods and design the f-value scoring mechanism based on consensus matrix to achieve the selection of the optimal combinations of preprocessing and dimensionality reduction methods with data specificity.This paper provides three preprocessing methods:log transform,no transform,and z-score transform,and three pairs of the three dimensionality reduction methods(PCA+UMAP,LE+UMAP,and LE+PCA).After that,SCM designs the f-value scoring mechanism based on the consensus matrix to calculate the f-value value for each combination,and identifies the optimal combination of the preprocessing and dimensionality reduction methods via f-value.The experimental results confirm the strong effectiveness off-value in identifying the optimal combination of preprocessing and dimensionality reduction methods.2.Devise two novel distance measures for the optimal consensus matrix.Instead of the commonly used Euclidean distance metric,two distance metrics are proposed in this paper.One is a weighted distance metric based on d-score,which considers valid information from different distance metrics and uses the inverse of the d-score as weights,and then constructs a new distance metric by adding up the weights.The other is the distance metric based on WGCNA,which takes into account the indirect distance of cells and fully captures the topological information between cells.Both of these distances offer flexibility in deriving an accurate distance metric between cells based on various single-cell datasets.Experiments on eleven benchmark datasets show that SCM performs better than almost all the other seven popular clustering algorithms,which demonstrates its great potential in revealing cellular heterogeneity,identifying cell types,depicting cell functional states,inferring cellular dynamics,and other related research areas. |