Font Size: a A A

Identification Of Novel Methylation Signature To Predict Prognosis In Lung Squamous Cell Carcinoma Based On TCGA Database

Posted on:2021-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:N LiFull Text:PDF
GTID:2404330605958186Subject:Oncology
Abstract/Summary:PDF Full Text Request
Background:Lung cancer is one of the most common malignant tumors worldwide,and a leading cause of cancer-related deaths.Lung cancer is often insidious,and is often not diagnosed until it is well advanced.Despite the continuous improvement in survival of patients-with other malignant tumors,the overall survival of lung cancer patients remained below 20%.Lung cancers can be divided into two major categories:non-small cell lung cancer(NSCLC)and small cell lung cancer(SCLC)according to their histopathology.And NSCLCs can be further divided into three subcategories:lung adenocarcinoma(LUAD),lvng squamous cell carcinoma(LUSC)and large cell carcinoma(LCC).87%of all lung cancer cases are NSCLCs,and a third of the NSCLCs are LUSCs,a special type of lung cancer that shows poorer 5-year survival than the other types.Advances in epigenetics and tumor research have boomed in the past 2 decades.And in recent years,a lot of effect have been put into studies on the significance of DNA methylation in LUSC.DNA methylation is one of the most important epigenetic modifications,and abnormal DNA methylation plays an important role in occurrence and progression of tumors.Many DNA methylation sites have been used as biomarkers for early diagnosis,choice of treatment and prognosis evaluation.However,studies on the significance of specific site of DNA methylation in the prognosis of LUSC are still lacking.ObjectiveIn this study,we seek to find DNA methylation sites that are potentially correlated with prognosis of LUSC based on the sequencing data and DNA methylation data from the TCGA database and the GEO database.Then,a model to predict the prognosis of LUSC based on methylation level of these sites was established using methylation data and clinical information of the patients.And the functional significance of the key sites was analyzed to guide choice of chemotherapy regimen.Materials and methods1.Data downloadFirst,we downloaded the DNA sequencing data of 550 LUSC tissue and normal tissue,and DNA methylation data of 359 LUSC tissue and normal tissue.Corresponding clinical data were also downloaded and processed into standardized expression matrix.2.Target gene screeningWe compared gene expression matrices of the LUSC samples and pericancer tissue samples,and identified the differentially expressed genes(DEGs).Then we compared DNA methylation sequencing results of the two types of samples to identify the differentially methylated genes(DMGs).Finally,from the intersection between the DEGs and the DMGs,we identified genes that showed a reverse correlation between methylation and expression levels as the genes of interest.3.Screening of prognosis-related methylation sitesTo further screen the prognosis-related methylation sites and for establishment and verification of a prognostic model,we divided the data sets from TCGA randomly at 2:1 into a training group and a test group;and used the GSE56044 data sets from GEO as the validation group for verification of the model.Methylation data of the sites of interest and corresponding clinical data of the training group were extracted,and analyzed by univariate Cox regression(P<0.05),random survival forests algorithm-variable hunting(RSFVH),and Cox multiple regression(P<0.05)to screen the methylation sites most likely to be correlated with prognosis.4.Construction and verification of the prognostic modelA prognostic model was constructed based on the coefficients in the Cox multiple regression.The formula was as follows:风险评分(Risk Score,RS)=(?)Meth*CoefiThen the training,testing,and validation data sets were divided into high-risk groups and low-risk groups using the median RS of the training group as the threshold,and a K-M survival analysis and a ROC analysis were performed to verify accuracy and specificity of the prognostic model.Meanwhile,we also conducted a nomogram analysis in the training group to further clarify whether the risk score is an independent risk factor.5.The mechanism underlying the risks predicted by the model and its significance in chemotherapy sensitivityTo further clarity the cause of differences in risk suggested by the model,we screened the signaling pathways may show significant difference between the high-risk and the low-risk groups using the GSVA(gene set variation analysis)method.In addition,methylation data of the LUSC cell lines in the GDSC(genomics of drug sensitivity in cancer)data base was also divided into a high-risk and low-risk group according to the threshold of RS determined in the training group,and sensitivity of the cells lines to common chemotherapy was compared between the two groups.Results1.A total of 7332 DEGs were obtained,and 389 DMGs were identified(|log2FC|≥1,P<0.05 and FDR<0.05).And 22 genes that showed reverse correlation between their expression and methylation levels were identified.2.A total of 620 valid methylation sites were identified in the 22 genes of interest Univariate Cox regression,machine learning and Cox multiple regression were used to found 3 final sites of interest that were independently correlated with survival of the patients.3.Based on the prognostic model,the samples of the training set were divided into a high-risk group and a low-risk group using the median overall risk score,and the high-risk group showed significantly poorer overall survival(OS)than the low-risk group(hazard ratio[HR]:2.72,95%confidence interval[CI]:1.82-4.07,P<0.001).The effectiveness of this prognostic model was further verified in the test group and the validation group Then we used nomogram analysis to verify these 3 methylation sites were indeed independent risk factor for prognosis of patients with LUSC,and established a nomogram that combines these 3 methylation sites and other risk factors(gender,staging,age,etc.)to quantitatively predict prognosis of patients with LUSC.4.At last,we analyzed genes that were differentially expressed between the high-and low-risk groups of the training set by gene set variation analysis(GSVA),and found that the DEGs were significantly enriched in cell cycle/mitosis,tyrosine kinase receptor,extracellular regulatory protein kinase/mitogen-activated protein kinase and other pathways(non-parametric method P<0.05),which may underlie the different prognosis of the two groups.In addition,the LUSC cell lines in the GDSC data base were divided into a high-risk group and a low risk group according to the RS threshold for comparison of chemotherapy sensitivity.And the results suggested cell lines of the high-risk group were more sensitive to gemcitabine and docetaxel than the low-risk group.Conclusion1.Using a data set from the TCGA data bases,we screened 3 methylation sites(cg06675147,cg07064331 and cg20429172)that were independently correlated with prognosis of patients with LUSC,and established a prognostic model for LUSC,which divides patients or samples into a high-risk group and a low-risk group.2.The accuracy and sensitivity of the model was further verified using a test data set and a validation data set from the TCGA and GEO database,confirming that the 3 methylation sites were independent prognostic factors of LUSC.3.GSVA analysis using the KEGG,BIOCARTA and REACTOME databases further demonstrated that the high-and low-risk groups showed significances in the cell cycle/mitotic,ERBB and ERK/MAPK signaling pathways,which may underlie the different prognosis between the two groups.4.At last using drug sensitivity data from the GDSC database,we found that cell lines of the high-risk group were more sensitive to gemcitabine and docetaxel,suggesting that patients of the high-risk group may be more responsive to these two drugs.
Keywords/Search Tags:Lung Squamous Cell Carcinoma, Overall Survival, Prognosis, Signature, Methylated Sites
PDF Full Text Request
Related items