Font Size: a A A

Single-cell Clustering Method Based On Consensus Strategy Evaluation

Posted on:2022-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:C X WangFull Text:PDF
GTID:2480306311450794Subject:Statistics
Abstract/Summary:PDF Full Text Request
Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types.Single-cell RNA sequencing extracts transcriptome information at the resolution of a single cell,which has completely changed the traditional transcriptome research.It is helpful to identify new cell types,and provides new research ideas for the in-depth study of the occurrence,development mechanism,diagnosis and treatment of complex diseases.Clustering algorithms based on single-cell gene expression have also been rapidly developed.Clustering is a core step in defining cell types for transcriptome data,and it is also one of the most widespread applications of scRNA-seq.SC3 is a widely used and effective single-cell clustering method,which aims to obtain the more reliable clustering results through consensus clustering.Consensus clustering refers to the statistical transformation of clustering results of multiple k-means into consensus matrix,and then the distance of consensus matrix is calculated to carry out hierarchical clustering.Each value in the consensus matrix represents the probability that two cells are grouped into the same cluster in the clustering results of multiple k-means,it represents the closeness of the relationship between two cells.Based on the consensus matrix,distance calculation is carried out and hierarchical clustering is performed,that is,the final single-cell clustering is completed.Therefore,how to optimally construct the consensus matrix and how to calculate the optimal distance based on the consensus matrix is an urgent problem to be solved.On the basis of consensus clustering,this paper proposes a single cell clustering method based on consensus strategy evaluation(SECC),and improves the clustering method by optimizing the construction of consensus matrix and the distance calculation method based on consensus matrix.The main work and innovation of this paper are as follows:(1)We found that different data preprocessing methods show quite different effects on clustering algorithms.Moreover,there is no specific pr-processing method that is appliable to all clustering algorithms,and even for the same clustering algorithm,the best preprocessing method depends on the input data.Therefore,we introduced three data preprocessing methods to process the data,and designed a preprocessing evaluation algorithm to select the best preprocessing method for the current dataset.(2)When conducting hierarchical clustering,different distance calculation methods will have different influences on clustering results,and the optimal distance calculation methods are different for different datasets.We used four distance calculation methods to transform the consensus matrix into four distance matrices for subsequent hierarchical clustering,and to evaluate the four clustering result,choose three better distance calculation methods to build a synthesis matrix,finally get the more reliable clustering result.
Keywords/Search Tags:single-cell RNA-seq, gene expression data, single-cell clustering, synthesis distance matrix, preprocessing method, SC3
PDF Full Text Request
Related items