| As researchers gradually deepen the analysis of single-cell data,more and more cell types are discovered and identified,and how to accurately identify rare types of cells has gradually become a research bottleneck.The discovery of rare cell types,such as stem cells,transient precursor cells,cancer stem cells,and circulating tumor cells,has important implications for understanding tissue biology in both normal and disease states.Although some rare types of cell detection methods have appeared,most of them are designed based on clustering algorithms.However,when the data set distribution is extremely uneven and the biological complexity is high,the prediction performance of clustering algorithms often has problems such as low accuracy,high false positives and poor time efficiency.In this regard,this paper proposes two methods designed for the detection of rare types of cells.The main contributions of this paper are as follows.First,in order to improve the accuracy and time efficiency of rare cell detection,this paper proposes Topo,a rare cell detection method based on topological properties.The model first analyzes the four topological properties of entropy,mean,median and skewness of each cell neighborhood to construct a topological feature matrix.An outlier detection algorithm was then used to obtain a continuous rarity score of the single-cell expression profile.In view of the existence of many rare cell types in practical problems,we further proposed a rare type cell identification framework combined with Topo,that is,using Topo to first detect rare types of cells,and then combine other conventional clustering algorithms for subtype division.Finally,this paper compares the experimental performance of the Topo method and other methods on 12 simulated datasets and 5 benchmark datasets(Baron_human,Shekhar,HCF-spleen,Macosko,and PBMC ~68k),and the results show the recognition accuracy and time of the Topo method.The performance has been significantly improved.Furthermore,this paper validates the validity of the Topo framework for rare cell subtype division.The innovation of the Topo method is to construct a topological feature matrix from the neighbor differences of rare types of cells,which has the advantage of effectively reducing the data dimension and improving the prediction accuracy and time efficiency.And the Topo framework can realize the effective expansion of different clustering algorithms.Second,in order to further improve the accuracy and stability of rare cell detection,this paper proposes a rare cell detection method HKT that fuses biological features and data topological features.The model builds three feature matrices: the original feature matrix,the nearest neighbor distance matrix and the topological feature matrix.And two outlier detection algorithms are used for calculation,and a total of six groups of rarity scores are obtained.We fused the six sets of rarity scores through an outlier detection algorithm to obtain rare cell type prediction results.This paper conducts a comparative experiment between the HKT model and other six groups of base models on the benchmark data set.The results show that the accuracy and stability of the HKT prediction results have been significantly improved.At the same time,we applied the HKT model to the analysis of the immune microenvironment composition of patients with acute myeloid leukemia(AML),and effectively identified rare cells such as killer T lymphocytes.The innovation of the HKT method is that it adopts an integrated way to integrate biological features and data topological features to detect rare types of cells. |