| It is significant to screen potential biomarkers related to the occurrence,development and prognosis of colorectal cancer and reveal the physiological and pathological processes related to colorectal cancer.In this paper,DESeq2 and weighted co-expression network were used in bioinformatics analysis to screen gene sequences with significant relationship to colorectal cancer,and KEGG and GO enrichment analysis were used to give signal pathways of colorectal cancer gene sequence enrichment to explore the pathogenesis.On this basis,targeted maximum likelihood estimation was used to give the causal estimators of each gene for colorectal cancer occurrence and prognostic survival.Survival analysis was performed using the genes that were selected to have a causal relationship with the development of colorectal cancer.In this study,Cox proportional risk、Deepsurv and Nnet-survival models were used to model colorectal cancer patients,and the consistency index C was used to evaluate the effect of the model.This thesis identified a total of 176 intersecting genes using DESeq2 and WGCNA methods.KEGG enrichment analysis revealed that this gene set was mainly enriched in pathways such as bile secretion,mineral absorption,nitrogen metabolism,proximal tubular bicarbonate regeneration,interconversion between pentose and glucuronate interconversions,steroid hormone biosynthesis,retinol metabolism,and drug metabolism-cytochrome P450.GO analysis showed that it was enriched in cellular metabolic processes,single hormone metabolic processes,steroid metabolic processes,cell apexes,UDP-glycosyltransferase activity,and hexosyltransferase activity.A targeted maximum likelihood estimation identified 22 genes with causal effect on colon cancer.Among them,17 genes had been verified to have a real influence on the mechanism of colon cancer.Three genes were selected to have causal effects on the survival prognosis of colon cancer,influencing colon cancer survival directly or indirectly.In the colon cancer survival analysis,a Cox proportional hazard model based on targeted maximum likelihood estimation,which screened 23 variables,was compared with a Cox proportional hazard model based on a multifactorial stepwise approach.The latter model performed better.The model was used to evaluate the risk of colon cancer patients,and the patients were divided into high and low-risk groups according to the median risk score.As the risk score increased,the number of deaths in the high-risk group increased significantly compared to the low-risk group.Moreover,this thesis presented ROC curves for the survival status of colon cancer patients after 1,3,and 5 years based on the model,with AUC values of 0.72,0.67,and 0.76,respectively.In the comparison of deepsurv,Cox,and Nnet models on the test set,the concordance index of deepsurv was 0.73,while that of Cox and Nnet models was 0.69 and 0.70,respectively,indicating that the predictive ability of deepsurv model was stronger than the other two models.The aim of this study was to use general bioinformatics analysis and targeted maximum likelihood estimation to identify genes with causal effects on the occurrence and survival prognosis of colon cancer,reduce the scope of subsequent experiments,improve efficiency,shorten the experimental cycle,and provide a methodological reference for genomic causal analysis. |