Font Size: a A A

Research Of Multi Omics Data Based Cancer Cooperative Drive Pathway Identification

Posted on:2021-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y YangFull Text:PDF
GTID:2370330611964273Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cancer is mainly caused by the accelerated accumulation of somatic mutations during the survival of a living body.A key step in cancer research is to distinguish driver mutations and driver genes.These driver mutations and driver genes cause tumors changing from a normal state to a malignant state.Studies have shown that although individual tumors exhibit diverse somatic cell transplantation and copy number variations,many of these events tend to affect a limited number of biological pathways.Driver genes tend to cluster in limited basic biological pathways,and the diversity and complexity of uncovering driver genes at gene level can be significantly reduced at the pathway level.Therefore,in recent years,more attention has been paid to identifying driving pathways and modules rather than individual genes.In addition,alike genes,individual pathways cannot operate independently.During cancer development,multiple driver pathways are likely to synergistically participate in the transformation of normal cells into tumors.Most of the existing driver pathway identification methods only focus on single driver pathway.Due to the limited information provided by mutation data and the incomplete interaction information of pathways,it poses a huge challenge to the identification of cooperatively driven pathways.In this thesis,the research of cooperative driver pathway identification is based on multi-omics information.The main works are as follows:(1)To solve the problem of insufficient utilization of multi omics biology data,a cooperative driver pathway identification method(CoDP)based on matrix factorization and tri-random walk was proposed.This method firstly uses matrix factorization on gene and microRNA(miRNA)expression data with importing gene interactions and genemiRNA regulatory networks to obtain disease-related gene-miRNA modules.Given that genes and miRNAs are important genetic materials and have a close relationship with pathways.Since the available association information of genes,miRNAs and pathways is not complete,CoDP introduces tri-random walk on the intra-and inter-relational networks of genes,miRNAs and pathways to update gene-pathway and miRNA-pathway association networks.Finally,it combines the gene-miRNA modules and the updated gene-pathway and mi RNA-pathway association network obtained in the previous two steps,and then identifies pathways with the highest coverage of the gene-miRNA module as the cooperative driver pathways.In both ovarian and liver cancer data,CoDP can effectively identify driver pathway.Compared with existing methods,CoDP can not only identify the known driver pathways,but also discover the cooperative relationships of the driver pathways.(2)To address the problem of the incomplete use of somatic data and the prior knowledge of the existing methods,this thesis proposed a novel cooperative driver pathway identification method(CDPath)based on Integer Linear Programming and Markov clustering.CDPath firstly usesILP to find gene modules which maximize coverage and mutual exclusion within modules,and to maximize functional interactions and co-occurrence between modules.Next,Markov clustering is used on the pathway interaction network to obtain pathway clusters with strong interaction.Finally,it identifies the pathways assigned to the same pathway cluster but different modules as cooperative driver pathways.On the data of breast cancer and endometrial cancer,compared with the existing methods,CDPath can statistically identify the most known driver genes,and it can identify novel driver genes.(3)Aiming at the fact that most methods do not consider the possible inconsistency in gene expression of different patients,we proposed a novel cooperative driver pathway identification method(CoPath)based on greedy mutual exclusivity and bi-clustering.CoPath firstly uses a greedy search in a signaling network to find mutually exclusive gene modules on somatic mutation data.Next,CoPath imports the mutual exclusion module and gene interaction information obtained in the previous step as regular items,and introduces bi-clustering on the gene expression data.CoPath identifies the gene modules assigned to the same cluster as cooperative driver pathways.On the breast and endometrial cancer datasets,pathways identified by CoPath are significantly enriched on the most functions related to cancer development compared to the pathways identified by other existing methods.In addition,the identified cooperative driver pathways have close connections on the signaling network.(4)Given that most methods lack a metric to verify the co-operations of pathways,we proposed a novel method for identifying cooperative driver pathways(CDPathway)via genes,miRNAs,and pathways.CDPathway firstly integrates somatic mutation and gene interaction data to identify potential driver genes using a gene gravity model.Next,CDPathway uses the identified potential driver genes to update the association weights among genes,miRNAs,and pathways,and uses collaborative matrix factorization to reconstruct the pathway interaction network.CDPathway identifies the pathways with the highest reconstruction score as cooperative driver pathways.On breast and endometrial cancer datasets,CDPathway is able to more accurately identify known driver genes compared to existing methods.CDPathway can also accurately reconstruct the pathway interaction network.By verifying the known interactions with disease-related pathways,the cooperative driver pathways identified by CDPathway have a significant effect on triggering cancer.
Keywords/Search Tags:Cooperative driver pathway identification, Multi-omics biology data, Clustering, Matrix factorization
PDF Full Text Request
Related items