Font Size: a A A

Single Cell Clustering Based On Gene Set Recognition And Multi-omic Data

Posted on:2022-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:X WeiFull Text:PDF
GTID:2530306332989439Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
scRNA-seq(Single-cell RNA Sequencing,scRNA-seq)can construct gene expression profiles at the single-cell level,and view differential expression changes between cells,providing an unprecedented opportunity to explore the causes of cell heterogeneity.At present,scRNA-seq sequencing methods can generally be divided into two types,full-length and tag,but the gene or transcript expression profiles obtained by these sequencing methods often have the problems of high noise,high dimensionality,and high sparsity,The problem will have a certain impact on single cell typing and subsequent downstream analysis.Alternative Polyadenylation(APA)has an important impact on the stability and functionality of mRNA,and plays an important role in post-transcriptional regulation.APA can provide additional information for improving the results of cell typing.Based on scRNA-seq data,this article combines gene expression information and APA to construct a GSAPA integrated framework.It aims to combine multi-omic methods from the perspective of gene set to better deal with the identification of cell typing and cell subsets,especially the identification and judgment of the impact of important gene sets.First,from the perspective of gene set,perform gene set identification and gene set activity calculation with gene expression matrix and APA site matrix to obtain gene activity matrix and site activity matrix.Secondly,using the method of multi-omic fusion,the gene expression matrix,APA site matrix,gene activity matrix and site activity matrix are fused.Finally,the cluster evaluation method is used to compare and judge the cell typing results.This article obtains 4 public data sets from the public mouse tissue database,uses the GSAPA process to calculate,and compares with other gene set activity calculation methods and multi-omic methods.The results show that compared with other methods,GSAPA has significant advantages over other methods,regardless of whether it is internal or external evaluation indicators.It can classify different cell types and contribute to the discovery of important functional gene sets and the identification of new cell subtypes.The innovation of GSAPA is that it uses non-negative matrix factorization to identify gene sets.This article does not need to obtain additional gene set information from external databases.In addition,this article uses multi-omic for the original matrix and the active matrix.It extends the application range of multimode for different aspects of multi-omic.More importantly,this article makes a certain explanation of the results of single-cell typing from the perspective of gene sets.The results show that,from the perspective of gene set,the clustering results can be significantly improved.
Keywords/Search Tags:scRNA-seq, APA, gene set activity, multi-omic, single cell typing
PDF Full Text Request
Related items