| Background: Gastric Cancer(GC)is one of the common cancers around the world.According to the Global Cancer Epidemiology Statistics(GLOBOCAN)published in 2018,the global incidence of gastric cancer ranks 5th among all new cases of cancer,and the number of new cases are more than 1.03 million;death rate ranks second among cancer-related deaths,with up to 780,000 people.In China,gastric cancer is the second largest malignant tumor,which ranks behind lung cancer.Most patients with advanced stage when diagnosed with gastric cancer,and often have a poor prognosis.Therefore,further understanding the molecular mechanism of gastric cancer development and finding appropriate molecular targets related to the prognosis of gastric cancer are of great significance for the development of treatment strategies for gastric cancer and improvement of prognosis.Objective: We obtain the DNA-microarray datasets from GEO(Gene Expression Omnibus)database,use WGCNA(Weighted gene co-expression network analysis)method to screen out relevant central genes,and construct a risk score model by related statistical methods to predict the overall survival prognosis of patients at 3 and 5 years,and provide a reference for prognosis of gastric cancer patients.Methods and results:Based on the GSE62254 series in the GEO database,gene expression data of 300 gastric cancer patients were obtained.The clinical data of 300 patients were obtained from the attachments of published articles,and a total of 298 patients with complete clinical information were selected.Using R software,the same gene probe was combined by the median expression to obtain a total of 20183 Genes.The variance of each gene in each sample was calculated,and the top 25% of the genes with the largest variance were screened out,and the non-coding genes were eliminated.Totally,4,718 coding genes were obtained for further analysis.The WGCNA method was used to screen out the most relevant gene module,then the turquoise module with the highest correlation with OS among all modules was selected,with a total of 1385 genes.The univariate cox regression method was used to screen out 494 genes that were related to OS(p <0.01)in turquoise module.Then,the 298 samples in GSE62254 were randomly divided into 7: 3 ratios,which were respectively the training group(n = 202)and the validation group(n = 96).In the training group,lasso regression and multivariate cox regression(stepwise method)were used to construct a cox proportional hazard regression model.The risk score formula composed of 9 genes is obtained: Riskscore = 0.3705 * EXP(DIO2)+ 0.2436 * EXP(DNAJC6)+ 0.5282 * EXP(FSTL3)0.2327 * EXP(RGN)+ 0.1699 * EXP(SPINK2)-0.0911 * EXP(CT83)-0.3356 * EXP(KCNRG)-0.3538 * EXP(PRELID2)-0.3483 * EXP(TUBA4A).In order to clarify the applicability of the model,we verified the diagnostic effectiveness of the model in the validation group and the total sample.The area under the ROC curve of the 3-year and 5-year survival prediction in the validation group were 0.696 and 0.700,respectively.In the total sample,the area under the ROC curve of the 3-year and 5-year survival prediction were 0.750 and 0.763.After that,we calculate the optimal risk score division threshold was 1.109681 from ROC curve of training group.According to the threshold,the verification group and the total sample were divided into high and low risk groups.It was found that the high risk group had a worse overall survival than the lower risk group by Kaplan-Meier survival analysis(validation group: Log Rank p = 0.014;Total sample: Log Rank p <0.0001).The patients were grouped according to stage I-II and stage III-IV,and the KM survival curve was drawn according to the threshold divided into high and low risk groups.The OS(overall survival)was still statistically different(stage I-II: Log Rank p = 0.0031,stage III-IV : Log Rank p <0.0001).Multivariate COX regression adjusted for age,gender,pathological stage,pathological type,and other factors,suggesting that the prognostic-related risk prediction model is an independent prognostic factor for patients with gastric cancer(p = 8.87E-11).Conclusion: 1.Through this study,a risk prediction model based on 9 genes was fitted,which has certain guiding significance for the prognosis of gastric cancer patients.2.The 9-gene prognostic prediction model is an independent prognostic factor for gastric cancer patients. |