Font Size: a A A

Statistical Inference Based On Tumor Single Cell Triple Omics Data

Posted on:2021-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:L Y ZhangFull Text:PDF
GTID:2404330611999042Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the development of single-cell multi-omics parallel sequencing technology,multi-omics information can be measured at single-cell resolution.Combining this information,cell variability and heterogeneity can be observed more carefully.Analyzing DNA and RNA sequencing data in the same cell can observe genomic variation and more accurately detect DNA mutations.Joint analysis of epigenomics and transcriptomics can reveal the regulatory effects of methylation and chromatin accessibility on gene expression,and single cell triomics joint analysis can more clearly identify specific cells and their functions,and truly unravel the meaning of cell heterogeneity.Oncogenesis is a complex biological process,cell heterogeneity exists in the genome,apparent group,and transcriptome at the same time.The same gene may have different DNA methylation or gene expression patterns in the same tumor cell.It is necessary to combine multiple omics information to clearly classify the cells into subgroups.This thesis proposes a joint clustering method based on single-cell triomics data,which can perform joint analysis on genomics,transcriptomics,and epigenomics data measured in the same cell.Single-cell triomics joint clustering is an improved multi-dimensional clustering method for systematic clustering.In the clustering process,the matrix norm is used to represent the distance between two cells,and the sum of squared deviations is used to represent the class-to-class distance,and feature selection based on the correlation between single-cell multi-omics before clustering.Based on the actual data,single cell triomics joint cluster analysis was performed.The sequencing data types of triomics are different,and there are problems of different dimensions or missing data.Processes such as quality control,gap value filling,data standardization,and feature dimensionality reduction are required.In this thesis,SKNN(Sequential K-Nearest Neighbor),Linnorm and t-SNE(t-distributed Stochastic Neighbor Embedding)algorithms are used to fill the gaps,standardize and reduce dimensionality of single-cell multi-omics data.In order to more accurately combine single-cell multi-omics information,Pearson correlation coefficient,Spearman rank correlation coefficient,Kendall's tau-b rank correlation coefficient and mutual information were used to analyze the correlation of different omics data.Single cell triomics combined clustering result analysis and cell subtype annotation.Compared with single-group clustering,three-group clustering improves the accuracy of clustering and can more accurately identify cell subpopulations.When performing joint clustering on actual data,it is found that matrix spectral norm clustering is the best,and the combination of gene expression and gene methylation clustering has the highest accuracy.Finally,based on the clustering results,identifing the differentially expressed genes of each cluster and labeling cell subtypes.After this step,single cell heterogeneity and tumor pathogenesis can be inferred,providing a theoretical basis for finding potential disease mechanisms and therapeutic targets.
Keywords/Search Tags:single cell triple omics, correlation, joint clustering, matrix norm
PDF Full Text Request
Related items