| In recent years,with the rapid development of single-cell RNA sequencing(scRNAseq)technology,the research of transcriptomics has changed dramatically.On the one hand,the cell is the unit of an organism,mining data at the single-cell level can help researchers probe the essence and laws of living activities,which help researchers explore the formation,development and cure of complex diseases at single-cell level.On the other hand,with the express development of scRNA-seq technology,the scale of scRNA-seq data obtained by researchers is growing,which brings enormous challenges to analyzing and computing.How to convert a high dimension data into low dimension embedding and preserve the topological structure of raw data are playing an indispensable role in scRNA-seq analysis.Besides,the high noise in scRNA-seq data will make it far too difficult to reduce dimensin,one of the most challenging noise is the dropout events.If the data is not processed carefully before analysis,it will make the results of downstream analysis unreliable,so denoising is integral in the process of dimension reduction.To address these challenges,a single-cell RNA sequencing data dimensionality reduction algorithm based on a hierarchical autoencoder is proposed,named SCDRHA,which can accomplish dimensionality reduction and data denoising.The SCDRHA pipeline consists of two core modules.The first module is scRNA-seq denoising using a deep count autoencoder,and the second module is dimensionality reduction using a graph autoencoder.First of all,SCDRHA normalized the raw count matrix.Then using a deep count autoencoder to accomplish noise reduction.Taking account of scRNA-seq data approximately obeys zero-inflated negative binomial distribution(ZINB),here SCDRHA uses the autoencoder framework to estimate three parameters of ZINB distribution conditioned on the input data for each gene.Unlike traditional autoencoders,the loss function is the likelihood of the ZINB distribution.Due to the raw data and reconstructed data have the same dimension,SCDRHA implements initial dimensionality reduction for the reconstructed data by using principal component analysis.Ultimately,to preserve the topological structure of the original data,SCDRHA builds a graph autoencoder based on graph attention networks to reduce the dimension and get a low dimensional embedding for visualization and clustering.Five real scRNA-seq datasets with known cell types are selected to assess the performance of SCDRHA.Owing to SCDRHA involves denoising,we compare SCDRHA with some state-of-the-art dimensionality reduction algorithms and compare it with some denoising algorithms.we use the average silhouette value to measure the effect of dimension reduction.To further evaluate the performance of SCDRHA,using normalized mutual information and adjusted rand index to assess the performance of clustering.Experimental results demonstrate that SCDRHA has better performance than existing state-of-the-art algorithms on dimension reduction and noise reduction in scRNA-seq data.Besides,SCDRHA can also dramatically improve the performance of data visualization and cell clustering. |