Font Size: a A A

Cancer Subtype Discovery Based On Random Walk

Posted on:2020-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:C YangFull Text:PDF
GTID:2404330575463023Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cancer is one of the main diseases threatening the safety of human life.With the development of high-throughput sequencing technology,a large amount of multi-omics biomolecular data has been generated,which brings opportunities to the research on the mechanism and therapy of cancer.A series of machine learning,computational methods were proposed to make effective use of the data.Among them,the discovery of cancer subtypes has become one of the research hotspots in oncology and bioinformatics.Dividing cancer patients into different subtypes can provide basis and guidance for precision medicine and personalized medicine,so as to improve the treatment effect,as well as provide assistance for cancer mechanism analysis and drug target research and development.Therefore,effective methods are urgently needed to make full use of and integrate different types of omics data to identify cancer subtypes associated with clinical significance.Genomic data is high-dimensional and with small sample and relatively large noise.There are complementary and mutually exclusive information among different types of omics data.It is of great theoretical and practical significance for the research and treatment of cancer how to design an effective multi-histological cancer subtype discovery method to mine the biological information in these data.In this thesis,our research work is based on the random walk model,which is used to optimize the use effect of somatic mutation data and improve the information integration method of the cluster ensemble method.The main work is summarized as follows:Firstly,we propose Network Diffusion Model Assisted Similarity Network Fusion to spread the influence of mutated gene by the gene interaction network,so the resulting data is "smooth" and contains information of gene networks;Then a sample similarity network is built for each data type,in this network,sample-sample similarity is no longer limited on single gene level,but based on based on the network;Finally,a fusion patient similarity network containing all information of different data types is established by using nonlinear iterative method.The fusion network can detect cancer subtypes by clustering algorithm.Secondly,Random Walk based Cluster Ensemble(RWCE)is proposed.We first obtain an improved similarity between clusters by random walk and scaling exponential similarity kernel function.Then it is used to fill the incidence matrix between samples and clusters,and a bipartite graph of sample-clusters is modeled.Spectral clustering algorithm was used to identify cancer subtypes.The experimental results show that our methods have advantages over the existing methods.The case study of cancer datasets showed that our methods discovered subtypes with clinical biological significance(such as drug response,prognosis,and age distribution difference).
Keywords/Search Tags:Cancer subtype, multi-omics data, clustering algorithm, cluster ensemble, random walk
PDF Full Text Request
Related items