Font Size: a A A

Research On Clustering Methods Of Single-cell RNA Sequence Based On Machine Learning

Posted on:2022-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:X L WangFull Text:PDF
GTID:2480306548996989Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Advances in single-cell RNA sequencing(scRNA-Seq)technology enable researchers to analyze genome-wide transcriptional profiles and solve biological problems at the resolution of a single cell.Clustering is an important step in the analysis of scRNA-Seq data.Its main function is to find the number of cell types and reveal the transcriptome characteristics of each cell type.Clustering is a key problem in the study of cell heterogeneity and cell differentiation.However,due to the evolution of scRNA-Seq technology recently,single-cell data is growing exponentially,and the existing clustering methods have high dropout rate and curse of dimensionality.Accordingly,it is of great significance to develop clustering methods that can meet the characteristics of single-cell data.In this paper,single-cell RNA-Seq clustering method is studied based on machine learning.This paper studies the clustering method of single-cell RNA-Seq based on machine learning,and the research contents are as follows:1.We propose a new clustering model sc BKAP based on deep learning autoencoder network.First,gene filter was used to remove genes that were not expressed in more than95% of cells.Secondly,the autoencoder network is used to reconstruct the selected scRNASeq datasets,which aims to reduce the influence of the dropout values,and the reconstructed expression matrix is obtained.Thirdly,M3 Drop algorithm is used to select the reconstructed data,and the main information in the data is extracted by the PHATE algorithm,which aims to find the low-dimensional expression of the original data.Finally,using the bi K-means algorithm to clust the cells,and obtain the results.The clustering accuracy is obtained by comparing with real labels.The performance of sc BKAP was tested on 19 real scRNA-Seq datasets and 2 simulated datasets.The results of sc BKAP were compared with 9 state-of-the-art clustering methods.According to the results of four indexes,sc BKAP model was superior to other methods in clustering results.2.We develop a new cluster model ISGAN based on the generative adversarial networks.First,ISGAN filters the single-cell RNA-Seq datasets to delete genes with a value of 0 in more than 95% of the cells.Secondly,the Wasserstein Generative Adversarial Networks is used to impute the filtered scRNA-Seq data and get the expression matrix after imputation.The purpose of WGAN is to eliminate the effect of dropout values in data by imputation.Thirdly,PCA algorithm is used to reduce the dimension of the data after imputation,so as to extract the main components of the imputation data,and provide convenience for the operation of the main dimension reduction algorithm.Then,the principal component extracted by PCA is reduced by Isomap algorithm.Finally,the adaptive spectral clustering is used to cluster the data after dimension reduction,and the final clustering results are obtained.The clustering accuracy is obtained by comparing with real labels.We run ISGAN and 9 other state-of-the-art clustering methods on 19 real singlecell RNA-Seq datasets and 2 simulated datasets to get the clustering results and compare them.According to the results of four clustering evaluation indexes(NMI,ARI,HOM and AMI),the clustering performance of ISGAN model is obviously better than other methods.
Keywords/Search Tags:machine learning, scRNA-Seq clustering analysis, gene filter, deep learning, autoencoder network, generative adversarial networks
PDF Full Text Request
Related items