Font Size: a A A

Study On Single Cell Clustering Method Based On Ensemble Learning

Posted on:2022-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y GaoFull Text:PDF
GTID:2480306731987749Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the past few years,many clustering algorithms have been proposed to analyze sc RNA-seq data.However,due to various challenges in sc RNA-seq data,such as high noise and high dimensionality,the time to identify cell types has increased.And complexity makes system data analysis more challenging,so the algorithm needs to be constantly adjusted.Most of the proposed methods use all the features or perform clustering through cumbersome feature calculations,which usually require a large computational cost to process high-dimensional data.Secondly,most of the existing algorithms are based on a single clustering algorithm.The result is difficult to satisfy the optimal generalization of complex single-cell data.To better solve the many problems mentioned above,in this article,we propose two clustering methods for analyzing sc RNA-seq data.The main work of this paper is as follows:(1)A single-cell clustering algorithm based on random subspace sampling is proposed.We first construct a latent feature space based on the original feature space,generate multiple subspaces by performing multiple random sampling on the latent feature space,and then construct a K-nearest adjacency graph to capture the local structure in the generated subspace.Next,to fuse affinity graphs from multiple subspaces,we use an iterative similarity network fusion scheme to achieve graph fusion for the final spectrum clustering.Experimental results show that the algorithm can effectively identify single cell types.(2)A single-cell clustering method based on link ensemble is proposed.This method uses a clustering ensemble strategy to cluster high-dimensional data.Based on five proposed methods for single-cell clustering analysis,several different clustering solutions were obtained,and the optimal clustering subset was selected according to their diversity;secondly,based on an ensemble method of link similarity assessment,a refined cluster association matrix is created by considering adjacent points' similarity,which integrates the information from different clustering schemes;finally,hierarchical clustering is used to generate the final data partition.Experimental results show that this algorithm can accurately identify single-cell clusters,which is better than other advanced algorithms in this topic,and can better promote the analysis of cell heterogene.
Keywords/Search Tags:Sc RNA-seq, Heterogeneity, Clustering, Ensemble learning, Subspace
PDF Full Text Request
Related items