Font Size: a A A

Integrated Strategy For Gene Differential Expression And Differential Co-expression Analysis And Application In Cancer Biomarker Discovery

Posted on:2022-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:M N WangFull Text:PDF
GTID:2504306566991959Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Genes are rarely regulated independently,as they interconnect and interact with each other in biological processes.In gene co-expression networks,genes whose expression levels are highly correlated make up a co-expression module,which is often described as a functionally coordinated and/or co-regulated gene sub-network that share a common function or biological process.Therefore,an effective way to understand dysfunctional processes in complicated human diseases is to identify the differential co-expression(DC)gene modules in which genes gain or lose co-expression in disease samples relative to samples of their healthy controls.Recently,DC analysis has been widely used in disease studies.A second approach to understanding human disease is differential expression(DE)analysis,which identifies genes whose average expression levels significantly differ in disease and healthy conditions.Although both DC and DE genes are correlated with disease-specific regulatory processes,they are often studied independently.Thus,effective integration of DC and DE analyses is underdeveloped.The interaction and interactivity of DC and DE genes requires deeper and broader research.To address this issue,we present a novel analytical framework named DC&DEmodule for integrating DC and DE analyses and combining information from multiple independent case/control expression datasets to identify disease-related gene coexpression modules,which include activated modules(gaining co-expression and upregulated in disease)and dysfunctional modules(losing co-expression and downregulated in disease).The main contents are as follows:(1)After developing the DC–DE integration analytical framework,we used it to analyze six Gene Expression Omnibus(GEO)microarray datasets that included samples of liver,gastric,and colon cancer tumors and their adjacent normal tissues.For each cancer type,two individual datasets were integrated to reduce deviation caused by single-dataset studies and to provide more reliable predictions.The framework has two pivotal steps: 1)identifying normal/tumor co-expression gene modules conserved in two individual datasets,and 2) identifying tumor-associated modules by integrating DC and DE analyses, including activated modules(gaining co-expression and up-regulated in the tumor)and dysfunctional modules(losing co-expression and down-regulated in the tumor).(2)By applying this framework to microarray data associated with liver,gastric and colon cancer,we respectively identified two,five and two activated modules and five,five and one dysfunctional module(s).Moreover,we identified 17,69, and 11 module key genes that were activated in the three cancers.These key genes were significantly up-regulated in tumor samples and correlated with other genes,they may be potential diagnostic or prognostic markers.(3)Among the key genes of the three cancer types,we discovered 15,35 and 3 known prognostic markers by performing a text-mining procedure.Then,we used the Kaplan–Meier plotter to analyze the effects of the other key genes not previously reported on cancer survival.Finally,we discovered 3 novel survival biomarkers(TCEB1,RFC4,and TRPC4AP)that were significantly correlated with the overall survival of the three cancers.(4)We used the key genes to train a random forest classifier for each of the three cancer types,which showed an average of 95%,93% and 91.5% accuracy in classifying tumor and adjacent normal tissue samples in additional datasets obtained from The Cancer Genome Atlas(TCGA)and Gene Expression Omnibus(GEO)databases.(5)We performed KEGG and Reactome pathway enrichment analysis to study the biological functions associated with the activated/dysfunctional modules. Pathway enrichment analysis strongly suggested that the resulting modules were associated with critical cancer pathways.Moreover,in order to evaluate the performance of our integration-based method.We applied independent DC analysis,independent DE analysis and DECODE to the microarray data associated with liver,gastric,and colon cancer,respectively.Pathway enrichment analysis demonstrated the superior sensitivity of our method in detecting both known cancer-related pathways and those not previously reported relative to similar methods.In summary,we believe that studying genes in the context of regulatory systems,which can be evaluated through gene co-expression modules,is critical for discovering important factors associated with human disease progression at the molecular level.In this study,we present a novel method for identifying dysregulated gene co-expression modules from case/control expression data by integrating DC and DE analyses and integrating the information from multiple individual datasets.KEGG and Reactome pathway enrichment analyses proved the superior performance of our module-based method relative to the methods aiming to identify independent genes,including independent DC or DE analysis and DECODE.This is consistent with the fact that genes are rarely regulated individually,as they interact and are thus correlated with each other in biological processes.We believe that the DC&DEmodule will provide profound insights into critical regulatory events in complex diseases.
Keywords/Search Tags:cancer, co-expression network, differential co-expression analysis, differential expression, gene co-expression module, integration-based analytical framework
PDF Full Text Request
Related items