| In recent years,with the rapid development of single-cell sequencing technology,single-cellRNA sequencing(scRNA-seq)is becoming a key technology in current genomics research.Compared with traditional sequencing methods,its outstanding advantage is that it can obtain genetic information of individual cells,explore cell specificity and differences between cells from the perspective of cell atlas,and explore the cooperative operation mode of cells and study tissue heterogeneity problems.Compared with bulkRNA-seq data,scRNA-seq data contains more zero values,which are mixed with a large number of false negative zero values caused by technical noise called dropout events.In scRNA-seq data,too many dropout values will distort the structure and feature level of gene expression profiles of cells,and bring great bias to the results of downstream analysis.Therefore,for the problem of impute method for high sparsity scRNA-seq data,we propose a mild dropout value imputation method AGImpute,which only imputes very few dropout values to retain a large amount of original data information.AGImpute consists of two parts: statistical module and deep learning module;in the statistical module,a mixed probability statistical model is constructed to calculate the dropout value probability of genes,and a dynamic threshold estimation algorithm is proposed to estimate the threshold of the number of dropout values in gene expression of each cell,realizing dropout value localization;the deep learning module integrates autoencoder and generative adversarial network to simulate and generate dropout values,and combines the output of the two modules to realize imputation.The method of identifying and imputing dropout values can reduce the amount of imputation and retain more original biological information.Using simulated data and real scRNA-seq datasets for experiments,the imputation performance of the proposed AGImpute method was compared with seven methods: SAVER,sc Impute,MAGIC,ENHANCE,VIPER,sc IGANs and sc LRTC.The experimental results show that AGImpute obtains better downstream analysis experimental performance of scRNA-seq data while retaining more original data information,and has stronger interpretative.Aiming at the problem that AGImpute method only considers the dropout value problem of scRNA-seq data and ignores the batch effect problem existing in the data itself.Based on the assumption that the total amount of gene expression in scRNA-seq data of the same cluster is consistent,using a random dynamic incomplete information game theory algorithm combining Pearson similarity and consistency assumption as the benefit function to correct the batch effect problem,we propose an imputation method(sc DIIG)that solves both the dropout value problem and the batch effect problem at the same time.The method uses random extraction participants,and through game and evolution within the same type of cells to reach Nash equilibrium,sc DIIG corrects batch effect while performing dropout value imputation.Using the same real scRNA-seq data,we compared and analyzed the experimental results of sc DIIG with those of AGImpute,SAVER,sc Impute,MAGIC,ENHANCE,VIPER,sc IGANs and sc LRTC eight dropout value imputation methods mentioned above.The experimental results show that sc DIIG method solves both the dropout value problem and batch effect problem of scRNA-seq data at the same time.Without discarding too much original biological information,it achieves better results than other eight methods in downstream analysis experiments.In summary,this paper analyzes the characteristics of scRNA-seq platform and data,dropout value imputation problem and batch effect problem,designs two effective dropout value imputation methods,provides two different solutions and application tools for upstream preprocessing method research of scRNA-seq analysis. |