Font Size: a A A

A Cluster Ensemble Based Method For Identifying Cell Types

Posted on:2021-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:H XueFull Text:PDF
GTID:2518306050467294Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The Advances in single cell sequencing techniques in recent years have enabled the high-throughput access to gene expression at the single-cell level for different tissue types and cell states,enabling biologists to analyze the heterogeneity of cells in cell populations,which has led to much single cell work.Of which,the identification of cell types with single cell data is the basis of many researches,and the accuracy of cell type recognition plays a crucial role in the analysis of downstream work.The traditional method of identifying cell types is limited by the single cell sequencing technology and can only be analyzed based on the single cell transcriptome sequencing data.Of course,these rich research work lays a solid foundation for the analysis of single cell multi-omics data.At present,with the development of single cell parallel sequencing analysis technology,it is possible to obtain single cell multi-omics data,which enables researchers to describe the state of cells from multiple omics perspectives.In the past,a lot of work has been done on bulk sequencing data to apply the idea of integration to identify cancer subtypes,while the development of single cell multi-omics parallel sequencing technology enables us to integrate the single cell multi-omics sequencing data to describe the state of cells.Therefore,the application of integration to single cell data is of great significance.This paper presents a cluster ensemble-based model for cell type identifying.The model can not only be applied to the single cell transcriptome data,but also be integrated with the single cell multi-omics data for cell type recognition.The model proposed in this paper consists of three modules,namely individual clustering module,basic partition filter module and weighted CSPA integration module.In which,individual clustering module applies SC3,t-SNE+k-means,SIMLR,k-means and the spectral clustering method on dataset,and in the basic partition filter module,the clustering results of each method would be filtered by three kinds of clustering internal evaluation index.the poorer clustering results would be excluded,and the remaining partitions would be weighted according to their evaluation indexes.Then the basic partitions would be integrated based on weighted CSPA integration module,and calculate the sample similarity,the final cell types would be obtained by spectral clustering.In this paper,the model was applied to the single cell transcriptome data and a Cluster Ensemble Based(CEB)method for identifying cell types was proposed.On five single cell transcriptome datasets,the results of cell type recognition after integration were compared with those of each individual clustering method before integration,and it was found that CEB method was superior to the single cell types identifying method of individual clustering in accuracy and robustness.At the same time,this article compared the result of the standard CEB method with incomplete CEB,nofiltering CEB method,random CEB method and the unweighted CEB method by adjusting the individual clustering methods of integration,found that changes in the integration of clustering method leads to small variations in the results of the CEB method,it shows that the CEB method has good robustness.At the same time,the results of the standard CEB method are better than those of unfiltered CEB method and unweighted CEB method,which proves that the basic partition filter module and the weighted CSPA integration module can indeed improve the accuracy of clustering results.In this paper,the model is applied to the single cell multi-omics data and a Multi-omics Cluster Ensemble-Based(MCEB)method for cell type identifying is proposed.The experiments on dataset Anger also showed that the results of MCEB method in cell type identifying on multi-omics data was better than those of single omics method and other methods,which indicated that MCEB method has high accuracy and robustness on multi-omics data.The experiments on dataset Clark show that MCEB method has the ability to identify new cell types,so MCEB method provides certain enlightenment for the identification of cell types by single cell multi-omics integration.
Keywords/Search Tags:single cell sequencing, multi-omics data, cell type recognition, cluster ensemble
PDF Full Text Request
Related items