Font Size: a A A

Research On Clustering Model Of Single-cell RNA Sequencing Data Based On Deep Learnin

Posted on:2024-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:C X KongFull Text:PDF
GTID:2530306923484784Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Single cell RNA sequencing data has the characteristics of high data dimension and strong noise,and thousands of genes contained in a single cell cause highdimensional disaster of the data.Due to the low RNA acquisition rate,some expressed genes were not detected,resulting in too many false zero values different from the real gene expression,resulting in dropout events.The dropout events and dimension disaster in the data bring great challenges to the cell clustering of single-cell RNA sequencing data,so it is very necessary to study the clustering methods of huge highdimensional and high-noise single-cell RNA sequencing data.Based on deep learning theory,this paper proposes two clustering models with good clustering effect according to the different characteristics of single-cell RNA sequencing data.The main research contents are as follows:(1)To solve the problem of both high and high noise in single-cell RNA sequencing data,a hierarchical embedding clustering model-DGGAE based on graph convolutional autoencoder was proposed.This clustering model integrated noise reduction and dimension reduction methods and realized data noise reduction,dimension reduction and clustering in a hierarchical way.Firstly,the negative logarithmic likelihood function of zero expansion negative binomial distribution is used as the loss function of the depth noise reduction autoencoder in the noise reduction task,which is used to deal with the lost noise in the data.In the dimension reduction task,the double decoded graph convolutional autoencoder is used to capture the topological structure features between data and the data’s own characteristics.Clustering uses KL divergence function as the loss function of clustering for hierarchical deep embedding clustering.DGGAE layers realize noise reduction,dimension reduction and clustering of data,and each layer improves the accuracy of clustering results.Through the experiment on nine real high-dimensional and highnoise data sets,compared with other traditional clustering methods,DGGAE clustering results have higher index values ARI and NMI.(2)In order to solve the problem that it is difficult to learn the high-order structural features of single-cell RNA sequencing data in the clustering process,a dual self-supervised clustering model-sc SDCN based on graph convolution network is proposed.The model uses the graph convolution layer to learn the high-order structure information of the data,uses the depth self-encoder to learn the characteristic information of the data,and uses the double self-supervision mechanism to integrate the graph convolution layer and the depth self-encoder into a unified framework for clustering.By adding transfer symbols,the data features learned by the deep self-encoder are added to the graph convolution layer of the corresponding level,which realizes that the graph convolution layer can learn the features of the data itself and the high-order structural features of the corresponding data in parallel.In addition,the graph convolution layer learns the high-order structural features between the data through the KNN diagram of the data,so the quality of the KNN diagram is very important,so a noise reduction self-encoder is added to the model to build a highquality KNN diagram using low-noise data.Under the double self-supervision mechanism,the clustering distribution of graph convolution layer and the clustering distribution of depth self-encoder tend to the target distribution,which improves the clustering accuracy of sc SDCN.Through experiments on four real data sets,compared with other clustering methods,the clustering results of sc SDCN get higher values of ARI and NMI.To sum up,this paper uses different methods to build two clustering models for different problems,which improves the clustering accuracy of single-cell RNA sequencing data and has a good clustering effect.
Keywords/Search Tags:Graph convolutional network, Autoencoder, Denoise, Dimensionality reduction, Clustering
PDF Full Text Request
Related items