| Background: Colorectal cancer(CRC)is a serious threat to human health.By2035,new cases of CRC are expected to increase to more than 2 million worldwide,and the incidence and mortality of CRC in China are on the rise.At present,the clinical decision-making and prognosis evaluation of CRC mainly depend on the tumor-node-metastasis(TNM)stage system.However,in clinical practice,clinical outcomes of patients with the same stage often differ significantly,indicating that the current staging system is insufficient to reflect individual biological heterogeneity and predict patient clinical outcomes,thus limiting clinical decision making.Therefore,it is of great clinical significance to explore more accurate and effective biomarkers to judge the prognosis of CRC.DNA methylation is an important epigenetic modification and regulation mode.Abnormal Cp G island methylation in gene promoter region can regulate gene expression,and plays an important role in the pathophysiological process of CRC.Currently,although a large number of studies have reported that DNA methylation can be used as an effective biomarker for the early diagnosis,prognosis and efficacy evaluation of CRC,only a few molecular markers of DNA methylation can be successfully applied in clinical practice due to the lack of sensitivity and specificity.This study aims to screen and identify DNA methylation-driven genes through bioinformatics analysis using online public database data,and then construct and validate a model for predicting the prognosis of CRC patients,so as to play a prospective role in the early diagnosis,prognosis and clinical decision making of CRC.Methods:1.DNA methylation-driven genes were identified using the "Methyl Mix" package of the R software by integrating gene expression and DNA methylation datasets from the TCGA(The Cancer Genome Atlas)cohort of CRC.GO(Gene Ontology)and KEGG(Kyoto Encyclopedia of Genes and Genomes)analyses were performed using Metascape.2.Univariate Cox regression analysis and LASSO(Least Absolute and Selection operator)regression analysis were carried out to obtain candidate DNA methylation-driven genes by intersection of the selected genes.According to the results of multivariate Cox regression analysis,the prognostic model of CRC was established.Using the risk score of the model,CRC patients were divided into high risk and low risk groups.Subsequently,the prognostic model was evaluated using the time-dependent Receiver Operating Characteristic Curve(td ROC).Kaplan-meier survival analysis was used to evaluate the performance of prognostic models in each subgroup of CRC.3.After performing univariate Cox regression analysis and stepwise multivariate Cox regression analysis on the risk score and traditional clinicopathological factors such as age,sex,T stage and distant metastasis,the Alignment Diagram based on risk score,age,T stage,lymph node metastasis and distant metastasis was established.td ROC analysis was performed to evaluate the predictive accuracy of the the Alignment Diagram for OS.Finally,it was further verified in the GEO(Gene Expression Omnibus)dataset GSE39582 queue.4.Weighted Gene co-expression Network Analysis(WGCNA),Gene Set Enrichment Analysis(GSEA),gene mutation analysis,and tumor-infiltrating Immune Cells(TIICs)analysis were used to comprehensively analyze the molecular and immune characteristics of the high-risk group and the low-risk group.Results:1.In this study,a total of 705 DNA methylation-driven genes were identified using "Methyl Mix" package of R software.Metascape functional enrichment analysis revealed that these genes were involved in a wide range of biological processes and pathways,including cell signal transduction,cell differentiation,apoptosis regulation,metabolism,etc.,and affected the gene dysregulation of various physiological processes,suggesting that DNA methylation of these genes may be associated with progression and prognosis of CRC patients2.Nine genes,LINC01555,GSTM1,HSPA1 A,VWDE,MAGEA12,ARHGAP,PTPRD,ABHD12 B and TMEM88,were screened by LASSO regression analysis and univariate Cox regression analysis of 705 DNA methylation-driven genes.The expression levels of these genes were negatively correlated with the degree of methylation(P < 0.001).The prognostic model was constructed based on the 9methylation-driven genes.Kaplan-meier survival analysis showed that patients in the low-risk group had significantly better OS than those in the high-risk group(P=2e-08).The Area Under the Curve(AUC)of the prognostic model at 1,3 and 5 years were 0.745,0.708 and 0.721,respectively.Kaplan-meier survival analysis of colorectal subgroups showed significantly lower OS in the high-risk group(P < 0.05).3.After univariate Cox regression analysis and stepwise multivariate Cox regression analysis,four independent risk factors were retained: risk score(HR=2.429,95%CI: 1.931-3.055,P < 0.001),age(HR=1.031,95%CI: 1.012-1.05,P =0.001),T stage(HR=2.372,95%CI: 1.431-3.929,P < 0.001)and distant metastases(HR=3.193,95%CI: 1.936-5.267,P < 0.001)were used to construct OS-related Alignment Diagram.The concordance index of the model was 0.811.td ROC analysis showed that the AUC at 1,3 and 5 years were 0.815,0.794 and 0.802,respectively.In addition,OS-related Alignment Diagram was further verified in the GEO dataset GSE39582,with a concordance index of 0.722 and AUC of 0.788,0.743 and 0.714 in 1,3 and 5years,respectively.4.WGCNA analysis of robustly expressed genes in the TCGA cohort identified key modules relevant to the prognostic model,comprising 1577 genes.GO/KEGG enrichment analysis by Metascape showed that the functions of “extracellular matrix organization”,“vasculature development”,“cell junction organization” and“cell-substrate adhesion” were significantly enriched.Many of these pathways have been reported to be involved in tumor initiation and progression.According to the cut-off criteria |MM| > 0.8,|GS| > 0.3,twelve hub genes with a high connectivity were screened out from the clinical significant module,including LZTS1,TIE1,STARD8,VEGFC,KCNE4,ADAMTS1,AFAP1L1,ITGA5,BCL6 B,MMRN2,CAVIN1 and CCBE1.GEPIA analysis also confirmed that overexpression of LZTS1,VEGFC,KCNE4,ITGA5,CCBE1 and CAVIN1 was associated with lower OS in CRC.5.Gene Set Enrichment Analysis(GSEA)showed that gene sets of the high-risk group were enriched in tumor progression and metastasis-related pathways and inflammatory response-related pathways,including: Angiogenesis,epithelial-mesenchymal transformation,IL6_JAK_STAT3 signaling,inflammatory response,and up-regulated KRAS signaling.6.A comprehensive molecular and immune characteristics analysis showed that the high-risk group was associated with tumor invasion,infiltration of immune cells executing pro-tumor suppression(such as MDSCs,regulatory T cells,immature dendritic cells)and higher expression of common inhibitory checkpoint molecules(ICPs)(such as PD1,PDL1,CTLA4,LAG3,TIM3 and TIGIT).Conclusions:1.The methylation levels of LINC01555,GSTM1,HSPA1 A,VWDE,MAGEA12,ARHGAP,PTPRD,ABHD12 B and TMEM88 were negatively correlated with their expression levels,and their expression levels were correlated with the OS of CRC.It can be used as a potential biomarker to predict OS of CRC.2.CRC prognostic model was constructed based on 9 DNA methylation-driven genes.There were significant differences in OS between high-risk and low-risk groups.The AUC of td ROC curve analysis at 1,3 and 5 years was about 0.8,indicating that this model has good sensitivity and specificity in predicting the OS of CRC patients,and can be used as an effective model to predict the prognosis of CRC patients..3.A comprehensive molecular and immune characteristics analysis showed that the high-risk group was associated with tumor invasion,infiltration of immune cells executing pro-tumor suppression and higher expression of common inhibitory checkpoint molecules(ICPs). |