Font Size: a A A

Dimensionality Reduction Of Omics Data Based On Complex Network And Its Application

Posted on:2022-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:H WuFull Text:PDF
GTID:2480306563460974Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Omics data has a wide range of applications and great potential in the research of life science and clinical medicine.Single cell transcription data can accurately analyze the gene expression of each cell,distinguish different cell populations,and discover new cell types.However,it is difficult to understand the world-wide features of single-cell transcriptional data intuitively,which also brings computational difficulties,and may cover up the real potential low dimensional structure.It is an effective method to project high-dimensional data into low dimensional subspace.However,single cell transcriptional data is noisy,low coverage,and there are a large number of dropout events,so it is inefficient to use traditional dimensionality reduction methods directly.In order to overcome this problem,this paper proposes a network-based dimensionality reduction framework for single-cell data,including dimensionality reduction process based on single-cell heterogeneous network and dimensionality reduction process based on single-cell homogeneous network.This framework is suitable for most dimensionality reduction algorithms.The main research of this paper is as follows:First,single cell heterogeneous network is a bipartite graph network composed of cell nodes and gene nodes.In single cell transcription data,the gene expression of the same type of cells should be similar,and the composition of edges between cell nodes should be similar.Based on this,the dimension reduction process of single cell heterogeneous network(SCHeN)is proposed.Using LINE and node2 vec algorithm(named as SCHeN_LN and SCHeN_NV)reduces the dimension according to the second-order similarity of cell nodes in heterogeneous networks.These dimensionality reduction results are comprehensively evaluated by WB,NMI,ARI and their two-dimensional visualization.The experimental results based on SCHeN show that it is better than direct clustering,traditional PCA and t-sne methods.SCHeN_LN performed well and stably on five single cell datasets.Second,in the dimension reduction process of single-cell homogeneous network(SCHoN),each single-cell transcriptional data sample is regarded as a node in the network,and the gene expression value in the node is regarded as the feature of the node,and the Spearman similarity coefficient between the vertices is calculated as the construction edge to obtain the single-cell isomorphic network.Using UMAP,ProNE and Deep Walk(named as SCHoN_UM,SCHoN_PN and SCHoN_DW)is used to reduce the dimension of single cell homogeneous network,At the same time,SCHoN_GCN_VAE is designed by combining GCN and VAE.SCHoN_UM performed well and stably on 5 single cell datasets,while SCHoN_GCN_VAE has some advantages in large data sets.Third,the single cell data dimension reduction framework based on network is applied to the dimensionality reduction of protein sequencing data of gene expression in human brain region to find out the differences between the data of cerebral ischemic stroke and the data of healthy control group,and carry out functional enrichment analysis on them,and find out the gene SERPINF2 related to cerebral ischemic stroke and some signal pathways related to symptoms of cerebral ischemic stroke,For example,complement and coagulation cascades are related to primary immune deficiency,blood disease,nervous system disease,vascular disease and congenital metabolic disorder,which is consistent with the fact that acute cerebral infarction can affect human nervous system,immune system and cardiovascular system.
Keywords/Search Tags:Dimension reduction, Single cell, Cerebral ischemic stroke, Clustering, Homogeneous network, Heterogeneous network
PDF Full Text Request
Related items