Font Size: a A A

Imputation Methods For Single-cell RNA-seq Data Based On Machine Learning

Posted on:2023-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:K ShiFull Text:PDF
GTID:2530307058463774Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Single-cell RNA sequencing(sc RNA-seq)can reveal gene expression patterns at singlecell resolution.Due to technical or biological noise,the gene expression matrix loses a large number of values,resulting in an excessive number of zeros.This phenomenon is called Dropout.Dropout can affect downstream data analysis results,such as misclassification of cell types,so researchers need to use an efficient interpolation method to impute sc RNA data.This paper proposes two methods LMF and sc GAT to impute single-cell RNA sequencing data.LMF is a non-negative matrix factorization method that combines the Laplacian regularization;sc GAT is a method of the imputation of the sc RNA sequence data through graph attention networks.For a given node,different neighbor nodes have different effects on it,so selecting the graph attention networks when processing data.Specifically,in the LMF,since the similar cells tend to have similar gene expression,this paper calculates cell similarity and gene similarity respectively,and adds Laplacian regularization based on non-negative matrix factorization.In sc GAT,we first construct a cell graph to gather gene expression information of similar cells,in which nodes are cells and edges are connected to similar cells.We use the graph network layer to aggregate gene information of similar cells.Finally,the LMF and sc GAT have clustering experiments.1)For the LMF,we perform cluster analysis of the sc RNA sequencing data after the imputation.Firstly,the sc RNA-seq data after the process of LMF and other methods are imputed,and then the T-SNE and Kmeans cluster analysis are performed.We also perform the clustering analysis by SC3.The results of two different cluster analyses indicate that LMF can better enhance the clustering effect of downstream experiments than other methods.2)For sc GAT,the T-SNE and Kmeans cluster analysis are carried out,and the results show that SCGAT has the effect of improving the downstream experimental cluster.2)For LMF and SCGAT,the random mask experiment is also performed,so that the gene expression matrix is masked in a certain proportion,and then the gene expression matrix is imputed by each imputation method.The results show that LMF and sc GAT have good effects on the imputation of dropout values.
Keywords/Search Tags:sc RNA, Dropout, Imputation, Non-negative matrix factorization, Deep learning, Graph attention network
PDF Full Text Request
Related items