Recently,numerous high-throughput single-cell sequencing methods have been developed,significantly impacting the fields of biology and intelligent information processing.The VAE(Variational Autoencoder),a mainstream deep learning framework,has gradually been applied to the realm of biological information.However,the design of network structures tailored to specific biological data still requires further investigation.Moreover,the inherent characteristics of single-cell data,such as high noise,high sparsity,and high dimensionality,pose challenges for researchers attempting to distinguish cell types.Consequently,this thesis aims to explore the following aspects:(1)In order to improve the accuracy of single-cell clustering,this thesis introduces a novel clustering analysis method based on a deep generative model.Our proposed method combines a variational autoencoder with a Bayesian Gaussian mixture model for the analysis of single-cell sequencing data.By employing a Bayesian Gaussian mixture model,our approach enhances clustering accuracy as it predicts the number of cell types without the need to pre-determine the number of clusters.We evaluated our method using six publicly available single-cell datasets,and the results showcase superior denoising performance in comparison to various baseline models.(2)In order to solve the problem of sample imbalance in single-cell data,this thesis suggests a deep residual generation model based on semi-supervised learning.Our method introduces a residual network into the semi-supervised generation model and utilizes semi-supervised learning to alleviate sample imbalance.During model training,the residual neural network completes the inference of cell types,enabling the extraction of local features from single-cell data and strengthening the model’s feature extraction capabilities.Experimental findings show that our strategy performs better in terms of accuracy than competing strategies.(3)This thesis presents a semi-supervised deep generative model that employs a self-attention mechanism to tackle the sparsity problem in single-cell data.Our model harnesses neural networks with a self-attention mechanism to predict cell types,allowing for the extraction of correlated features between cells and enhancing the model’s feature extraction capabilities.Furthermore,the model’s data generation capacity is employed as a means of imputing missing data,effectively addressing the sparsity issue in single-cell datasets.As demonstrated by experimental results on various simulated and real-world datasets,our approach outperforms previous methods in accurately classifying cell types and characterizing single-cell data.(4)This thesis presents a novel approach to feature extraction for single-cell datasets by employing a semi-supervised deep generative model based on multi-scale attention.Our model incorporates a multi-scale attention module that combines the self-attention mechanism with a convolutional neural network layer within the semi-supervised deep generative model.This addresses the limitations of autoattention networks that may excessively focus on global features,allowing the network to consider both local and global features.Consequently,our model effectively extracts features from single-cell datasets,with experimental results on real datasets substantiating its accuracy.The method proposed in Research 1 is a clustering algorithm,while the methods proposed in Research 2-4 are classification methods.These approaches are parallel and independent of each other,both falling under the umbrella of deep learning.These methods not only predict single-cell types with outstanding clustering and classification capabilities,but they can also serve as pre-trained models for data,facilitating the smooth progression of subsequent research. |