| Background: Colorectal cancer(CRC)has a high incidence and mortality worldwide,though many treatments,such as targeted therapy and immunotherapy,have been applied to the treatment of CRC,the result is not pleasant,and it is urgent to find novel targets and biomarkers for diagnosis and individualized treatment.Previous studies revealed that epigenetics modulate normal and tumor cells,and play an important role in the disease period of CRC,which gives insight into early prediction,diagnosis,and treatment.Bioinformatics has had a great development in recent years,and potential targets can be precisely found by analyzing gene sequences.As public databases,GEO(Gene Expression Omnibus)and TCGA(The Cancer Genome Atlas)have stored a giant amount of gene sequence results,clinical information,and prognosis.This article constructed and validated an epigenetics prognostic signature in CRC based on the above datasets,which may support in predicting the prognostic of CRC.Methods: Gene matrixes of the CRC dataset were downloaded from GEO(GSE39582),and the clinical information was abstracted.Gene matrix and clinical information of TCGA-COAD were downloaded from TCGA,and were merged by Perl,the epigenetics genes were downloaded from Epifactors.Prognostic genes were screened by Univariate COX regression in R,and Epigenetic-related genes(ERGs)were gained by intersecting epigenetic genes with prognostic genes in GEO and TCGA.GO(Gene Ontology)and KEGG(Kyoto Encyclopedia of Genes and Genomes)enrichment analysis explored passways of ERGs,the c Bioportal database detected the gene alterations of ERGs,and the Gene MANIA depicted the relationship among 6 ERGs,the HPA database explored the differential expression of ERGs among normal and tumor tissues.446 CRC patients of TCGA with intact survival time and state were considered as the training group,the prognosis model was constructed after multivariate COX regression,riskscores of patients were calculated by coefficients,the high-risk and low-risk groups were divided by the median riskscore,the K-M curve was depicted by the survival analysis.ROC(Receiver Operating Characteristic Curve)evaluated the model efficiency,561 patients in GEO validated the preciseness of the model.Further,the relationship between riskscore and usual clinical data of CRC patients(age,gender,T,N,M,and stage)was explored.By univariate and multivariate COX regression analysis of 387 patients with intact clinical data in TCGA,riskscore were found as an independent risk factor of CRC,and finally,the nomogram and calibration curve were delineated.Results: 2603 and 1368 prognostic genes were screened by univariate COX regression in GEO and TCGA-COAD,720 epigenetics genes were downloaded from Epifactors,6ERGs(LBR,SMARCD3,ZNF532,MAPKAPK3,SFPQ,and SIN3B)were screened by intersecting the above genes.GO and KEGG showed that ERGs affect protein deacetylation and steroid synthesis,the c Bioportal explored the gene alterations of 6ERGs,and Gene MANIA showed the relationship between ERGs,and HPA revealed different expression of some ERGs among normal and tumor tissues.The prognostic model was constructed by multivariate COX regression of ERGs,riskscore=(-0.240383164 x LBR expression)+(0.072617397 x SMARCD3 expression)+(0.02043923 x ZNF532 expression)+(-0.668433873 x MAPKAPK3 expression)+(-0.363823683 x SFPQ expression)+(0.495221703 x SIN3 B expression),patients in the training group were subgrouped into high-risk and low-risk by median riskscore,KM curve was depicted by Log-rank method with p <0.05,showed a significant difference,AUC of ROC was 0.722,which means the model has a great ability in prediction,the testing group validated the signature with significant p-value of K-M curve.Finally,the relationship between riskscore and clinical characteristics was revealed,and the riskscore is an independent risk factor of CRC,the nomogram with riskscore and clinical characteristics was depicted,and calibration curves confirmed the precise prognostic ability of the signature.Conclusion: This study identified six ERGs(LBR,SMARCD3,ZNF532,MAPKAPK3,SFPQ,SIN3B)from TCGA,GEO,and Epifactors,then constructed epigenetics prognosis model of CRC,the model was validated and showed great efficiency,which may give novel insight to the targeted therapy of CRC in the future. |