Font Size: a A A

Research On Single Cell Data Integration Method Based On Generative Adversarial Networks

Posted on:2024-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y YanFull Text:PDF
GTID:2530306938959169Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Single Cell Sequencing(SCS)technology refers to the sequencing of the genome or transcriptome of individual cells.By sequencing the whole genome,transcriptome,and epigenome of single cells,it enables the acquisition of data at multiple levels such as the genome and transcriptome.This technology is used to study population differentiation,differences,and evolutionary patterns among different species.It also helps to gain a deeper understanding of the complex heterogeneity mechanisms involved in the development and progression of diseases,providing new insights for disease diagnosis,prognostic assessment,and monitoring of drug efficacy.Single Cell Sequencing(SCS)technology enables the discovery of heterogeneity among individual cells,differentiation of small cell populations,and the construction of cell atlases.It is currently widely used in various fields such as species identification,pathogen screening,pathogen evolution,tumor heterogeneity,and circulating tumor cell research.However,results obtained from different experimental techniques and data modalities may have biases,and shortcomings in data processing and feature extraction can affect subsequent cell clustering,identification of cell subpopulations,and all downstream analyses.Existing methods for single cell data processing and integration still face challenges such as low efficiency,limited interpretability,and suboptimal clustering performance.Therefore,it is of great significance to develop effective and accurate methods for integrating various sources of single cell data to enhance single cell sequencing technology and accelerate life science research.In this regard,we propose a single cell data integration model called Convolution Multiscale GAN(CMGAN)based on Generative Adversarial Networks(GANs)and multiscale convolutional neural networks.The main research objectives and innovations include:(1)Based on the "cycle consistency" principle of Generative Adversarial Networks(GANs),we make improvements to the Portal framework of GANs: firstly,we introduce convolutional neural networks to enhance the representation learning module;secondly,in the GAN module,we replace the original multi-layer perceptron discriminator with a multiscale convolutional discriminator and modify its loss function.By replacing the representation learning module and GAN module of Portal with the improved modules,we construct the new CMGAN model.To validate the effectiveness of the model,we analyze and design the integration process of single cell data using the improved GAN,and develop an improved algorithm for integrating single cell data using GAN.(2)By employing CPU/GPU parallel mode,we design the preprocessing,generation,and discriminator modules of the CMGAN model in parallel.This allows us to construct a parallel algorithm for integrating single cell data using the generative adversarial network,thereby improving the speed of preprocessing,integration,generation,and discrimination of massive single cell data.(3)The CMGAN algorithm and the parallel algorithm for integrating single cell data using the generative adversarial network are implemented in Python programming language.Multiple real cell datasets are used for learning,training,and conducting simulation experiments.The experimental results of the CMGAN algorithm are as follows: the quantitative metrics of Adjusted Rand Index(ARI),Normalized Mutual Information(NMI),and Silhouette Coefficient are 0.188,0.317,and 0.478,respectively.These values are higher than those of the deep learning models such as Portal by 0.034,0.131,and 0.052,higher than those of IMAP by 0.067,0.123,and 0.002,and higher than those of sc Gen by 0.052,0.128,and 0.042.The CMGAN algorithm outperforms the non-deep learning algorithm Seurat by0.068,0.047,and 0.0313.It also outperforms the Harmony algorithm by 0.063,0.019,and0.014,and the Liger algorithm by 0.108,0.071,and 0.052.The parallel CMGAN algorithm improves the integration speed of the CMGAN algorithm by 83.24 seconds,with an acceleration ratio of 1.443.The experimental results demonstrate that the CMGAN model and its integration algorithm outperform deep learning models such as Portal,IMAP,sc Gen,as well as non-deep learning algorithms Seurat,Harmony,and Liger in terms of single cell integration.Therefore,CMGAN is an effective method for single cell data integration based on generative adversarial networks,offering good trainability,interpretability,and visualization effects.
Keywords/Search Tags:single cell sequencing, generative adversarial networks, multiscale convolutional neural networks, data integration model
PDF Full Text Request
Related items