Font Size: a A A

Analysis Method Study Of Single Cell RNA Sequencing Data

Posted on:2021-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:W Y HanFull Text:PDF
GTID:2480306047491364Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Single-cell RNA sequencing(scRNA-seq)is a breakthrough technology in recent years,which can measure RNA level and infer gene expression profile at single-cell resolution,providing a powerful tool for fully revealing gene expression differences between cells.Singlecelled life science research provides a more reliable scientific basis for the exploration of the causes,development and treatment of major diseases.Defining cell types through unsupervised clustering which is based on transcriptome similarity has become one of the most powerful applications of scRNA-seq.However,the initial amount of RNA obtained from a single cell is low,scRNA-seq data typically show high noise levels and excessive zeros.Dropout is defined as "false" zeros which differ from real gene expression.If this inherent noise is not taken seriously,it will inevitably destroy the potential biological signal and hinder the downstream analysis.Therefore,it is necessary to propose a scalable denoising method for scRNA-seq datasets which are increasingly large and sparse in high dimensions.On the other hand,feature learning and clustering tasks are independent from each other in most researches,while the clustering results obtained by step are usually suboptimal.Aiming at the above problems,this paper firstly preprocesses scRNA-seq data to ensure data quality,including cell filtration,gene filtration and normalization.Then the distribution of scRNA-seq data was assumed reasonably and the zero-inflation negative binomial distribution model(ZINB)was proposed to characterize the generation process of scRNA-seq data.Then,based on this hypothesis,a special target loss of the autoencoder model is proposed to learn the specific distribution parameters of genes in an unsupervised way.So that the true data manifold can be captured while the dimension is reduced and remove the influence of dropout noise.On this basis,the ZINB loss was further integrated with the deep embedded clustering(DEC)algorithm to improve the embedded representation while optimizing the clustering assignment,retaining the advantage of denoising.In this paper,different sizes of bottleneck layer are set for comparison,and the optimal clustering number k is selected through multiple experiments.Finally,the simulation and real datasets are applied to comprehensively evaluate the model performance by comparing with existing methods.The experimental results show that the proposed model can better explain the nonlinear relationship in data and effectively characterize scRNA-seq data with excessive dispersion and zero inflation.It's proved that the model is robust to dropout noise and it can improve cluster analysis and enhance biological discovery.
Keywords/Search Tags:scRNA-seq, high dimensional and sparse data, dropout, cell clustering
PDF Full Text Request
Related items