Immune cells play vital roles in both health and disease states.They maintain the immune homeostasis of the body by identifying,capturing,and eliminating foreign pathogens.To gain a comprehensive understanding of immune cell functionality and regulatory mechanisms,researchers employ diverse systematic approaches,with transcriptome sequencing is one of the important methods.Transcriptome sequencing can comprehensively analyze gene expression in immune cells.Through comparisons of transcriptome data between healthy and disease states,differentially expressed genes related to diseases can be found,and their functions and regulatory networks can be studied in depth.The application of single-cell RNA-sequencing(scRNA-Seq)technology enables us to conduct more detailed analysis of immune cells,revealing the heterogeneity among cells,identifying and classifying different subgroups of immune cells.However,accurate identification of immune cell types in scRNA-Seq data presents challenges due to the high heterogeneity and sparsity of the data,as well as the similarities in gene expression among different immune cell types.Traditional approach to annotate cell types is to start with an unsupervised clustering algorithm to group cells with similar profiles,and then to manually inspect each cluster for the expression of marker genes that distinguish specific cell types,but this process is time-consuming and labor-intensive,the annotation results are relatively subjective and lacks repeatability.While some automated annotation tools have developed,few are developed specifically for immune cell type identification,the cell types they can annotate are limited,especially for immune cell subtypes,and there is no unified annotation output standard,making it difficult to directly compare results between different tools.Therefore,it is highly necessary to develop new computational methods and tools for the accurate identification and refined annotation of immune cell types in scRNA-Seq data.To address this issue,we first collected a substantial amount of scRNA-Seq data,with annotation labels encompassing the majority of immune cell types,and unified revised the annotation labels of these datasets.Secondly,to tackle the challenge of inconsistent immune cell annotation labels,we hierarchically annotate immune cells based on their developmental lineage characteristics.Using optimized gene sets and the single-sample gene set enrichment analysis(ssGSEA)algorithm,we developed a tool called sc-ImmuCC(single-cell RNA-Seq based immune cell composition)for hierarchical annotation of immune cell types from scRNA-Seq data.sc-ImmuCC simulates the natural differentiation of immune cells,and the hierarchical annotation includes three layers,which can annotate nine major immune cell types and 29 cell subtypes.The results demonstrate its stable performance and strong consistency among different tissue datasets with average accuracy of 71-90%.In addition,the optimized gene sets and hierarchical annotation strategy could be applied to other methods to improve their annotation accuracy and increase the cell subtypes that can be annotated.In addition,in order to overcome the limitation of sc-ImmuCC in annotating only known cell types,we applied the open set learning to cell type annotation in scRNA-Seq data,which commonly used in image recognition.By comparing different feature selection methods,the immune cell dataset is modeled after feature selection(direct modeling of non-immune cell data),and the model is continuously tested and optimized.Ultimately,we successfully create open-set recognition models capable of accurately identifying both non-immune and immune cell types,achieving stable annotation performance.Lastly,in order to evaluate the similarities and differences of the immune response induced by respiratory viruses,as well as to reveal the mechanism of host response to different respiratory viruses,we analyzed transcriptome data from different respiratory viral infections.We apply sc-ImmuCC and tissue immune cell quantification tools to identify and quantify immune cells from these datasets,with a unified analysis standard.The results showed that COVID-19 patients have differences in cell type composition in different tissue,and the proportions of T cells,natural killer cells and monocytes in the same patient vary significantly at different time points.Additionally,the proportion of monocytes in COVID-19 and influenza patients was significantly higher than that of healthy donors.In summary,this study uses the scRNA-Seq data of different human tissues to develop an immune cell hierarchical annotation model based on the ssGSEA algorithm,which provides a reliable tool for accurately identifying immune cell types within scRNA-Seq data.sc-ImmuCC can help us better understand the differences in the number and composition of immune cells in different tissues,and analyze the immune response in different pathological states.Further,we developed a cell identification model based on open set learning,and tested it with different data sets.Both open set and closed set tests achieved stable performance in cell type identification,indicating that this approach is expected to contribute to the precise annotation of scRNA-Seq immune cells.We applied sc-ImmuCC to the COVID-19 patient dataset,achieving fast and accurate annotation for immune cells.Therefore,the utilization of sc-ImmuCC enables swift annotation and comparison of immune cell number and gene expression differences across different tissues or pathological states,facilitating the exploration of immune commonalities and differences among patients with diverse clinical symptoms,enrich our knowledge of the functions of immune cells in different physiological or pathological conditions.In conclusion,this study presents a viable annotation method for the identification of immune cells in scRNA-Seq data,which can quantitatively identify immune cells in tissues,providing more clues and ideas for immunological research.Moreover,by developing accurate immune cell identification methods,the efficiency and accuracy of scRNA-Seq data analysis can be improved,enabling better utilization of such data for research and analysis. |