| Background:Hepatocellular carcinoma(HCC)is one of the most common malignant tumors.The early symptoms of HCC are not obvious and patients often lose the opportunity for surgery when they are diagnosed.Therefore,it is extremely urgent to explore a new HCC diagnosis and treatment strategies.In our previous study,we illustrated that dysplastic nodules(DN)concurrent with chronic liver disease such as hepatitis or cirrhosis increased incidence of HCC.So we proposed a new concept,Pre-HCC,including low-grade dysplastic nodules(LGDN)and high-grade dysplastic nodules(HGDN).However,compared with LGDN,HGDN has a higher incidence and is considered to be high-risk Pre-HCC.However,the key genes and potential regulatory pathways involved in the malignant transformation of high-risk pre-HCC remains unclear.Purposes:This study analyze the bioinformatics data in GEO,TCGA and ICGC databases,aiming to(1)screen the differential expression genes of high-risk Pre-HCC and HCC by bioinformatics methods,and explore the key genes and potential molecular regulatory mechanisms of high-risk Pre-HCC;(2)construct a HCC survival model based on differential expression genes,analyze the relationship between the HCC survival model and clinical features,and explore the correlation between the HCC survival model and tumor microenvironment.Methods:(1)GSE89377 and GSE6764 chip data were downloaded from GEO database.Then differential expression analysis of gene data was performed by using R language package"limma" to gain the differential expression genes between high-risk Pre-HCC and HCC.R language package "Cluster Profiler(Version 3.14.3)" was used for GO and KEGG gene function enrichment analysis of screened differential expression genes.(2)Protein-protein interaction network map(PPI)of differential expression genes was constructed through STRING database,and the top 10 differential expression genes were obtained by analysis of cytoHubba in Cytoscape3.7.2 software.The genetic data and clinical data of 365 HCC patients were downloaded from TCGA database as the training cohort,and the top 10 genes were analyzed by univariate and multivariate analysis.The optimal survival prediction model of HCC was established by Lasso-Cox method using R software package"glmnet".(3)TCGA HCC patients were divided into high-risk group and low-risk group according to risk score of the survival prediction model.K-M survival curve was analyzed by R software package "Survival”and ROC curve was drawn to evaluate the prediction ability of the HCC survival model.(4)We also analyzed the correlation between the clinical characteristics of TCGA HCC patients and the HCC survival model.To determine whether the model is available in other cohorts,We obtained the data of 235 HCC patients in ICGC database as verification cohort to verify the survival model of HCC.(5)Then,we calculated the cell score of 22 immune cells and 2 stromal cells in TCGA HCC patients,and analyzed the differences of tumor microenvironment between the high and low risk groups.(6)GSEA analyses were conducted elucidate the underlying mechanisms in the high and low-risk groups.Results:(1)A total of 268 DEGs were detected,of which 57 genes were upregulated and 211 genes were downregulated.GO and KEGG enrichment analysis revealed that the DEGs were enriched in extracellular structure,various catalytic enzyme activities,various metabolic processes(such as lipid metabolism,amino acid metabolism,etc.),complement coagulation level pathway,PPAR signaling pathway,cell cycle and DNA replication.Meanwhile,some crucial molecular functions and cellular components were also enriched.(2)CytoHubba identified that AURKA,AURKB,NUSAP1,MELK,CCNB2,PRC1,TOP2A,PTTG1,UBE2C and NCAPG were the top 10 genes with prominently expression between high-risk Pre-HCC and HCC.Lasso-cox analysis established a four-gene optimal survival model.(Risk Score=-0.3852*NUSAP1+0.2149*MELK+0.1346*PTTG1+0.5585*NCAPG).(3)According to the risk score,HCC patients were divided into high-risk group and low-risk group.Kaplan-meier survival curve illustrated patients in the low-risk group had a better overall survival than those in the high-risk group(P=5.4E-7).Time dependent ROC analysis indicated that the constructed risk model exhibited precise predictive capacity over a period of 5 years,and the area under curve(AUC 95%confidence interval)of the ROC curve for 1,2,3,and 5 years was 0.76(0.83-0.69),0.74(0.80-0.67),0.71(0.79-0.64)and 0.70(0.79-0.61),respectively.This indicated that the model had a good predictive ability for predicting the early survival of HCC patients.(4)Univariate and multivariate Cox regression analyses revealed that worse tumor nodules,later stage,decreased BMI,increased AFP and higher risk scores were associated with worse prognosis of HCC(P<0.05).Gender,age,pathological grade,Child grade and fibrosis grade had no statistical correlation with HCC prognosis.(5)In the validation cohort,we calculated the risk score of each patient from ICGC database 235 HCC patients according to the four-gene survival model,and divided them into high-risk and low-risk groups.Survival analysis showed that the overall survival rate of high-risk group was significantly lower than the low-risk group(P=3.4E-5).The AUC(95%confidence interval)for predicting 1,2,3 year overall survival was 0.70(0.84-0.57),0.72(0.82-0.62)and 0.70(0.81-0.59),respectively,indicating that the model also had a good predictive ability for predicting early survival of HCC patients in validation sets.(6)In addition,tumor microenvironment analysis showed that risk scores were positively correlated with B cells memory,T cells follicular helper,T cells regulatory(Tregs)and macrophage M0,with coefficients are 0.17,0.33,0.16 and 0.30,respectively.The risk scores were negatively correlated with B cells navie,T cells CD4 memory resting,monocytes,macrophages M1,endothelial cells and fibroblasts,with coefficients are-0.25,-0.21,-0.19,-0.10,-0.26 and-0.10,respectively.(7)To further explore the relationship between pathways and the prognosis of HCC patients,GSEA analysis was used to evaluate the overall trend of pathways in the high and low-risk groups.Compared with the low-risk group,the expression of p53 signaling pathway,cell cycle and DNA replication related pathways were significantly increased in the high-risk group,while the expression of bile acid metabolism,fatty acid metabolism,complement and coagulation cascade,and drug metabolism cytochrome P450 and PPAR signaling pathways were significantly decreased in the high-risk group.Conclusions:According to bioinformatics analysis results,NUSAP1,MELK,PTTG1 and NCAPG were potential key genes of high-risk Pre-HCC,indicating poor prognosis of HCC.We developed a four-gene prognostic risk model,by using Lasso-Cox.The model had a good predictive ability for early HCC survival both in the training cohort and the validation cohort.In addition,the imbalance status of various cellular interaction from tumor microenvironment may affect the prognosis of HCC.These results are of great significance in providing a new study method and research basis for Pre-HCC. |