| lung squamous cell carcinoma(LUSC)is a unique pathological type of non-small cell lung cancer.LUSC generally have poor prognosis because they also tend to be older,typically diagnosed at advanced stage and lack of effective molecular targeted drugs.Therefore,it is urgent to find biomarkers related to the prognosis of LUSC to provides a new treatment strategy for LUSC and the prognosis of patients.Based on GEO and TCGA databases,the prognostic genes of lung squamous cell carcinoma were screened a prognostic prediction model was established by bioinformatics.1.Gene expression profiles and clinical data of LUSC were downloaded from GEO and TCGA databases.The data was standard normalized and quality controlled using the packages "affy" and "edgeR" packages in R language,respectively.2.Based on the median absolute deviation(MAD)of gene expression values in cancer samples,the abundantly expressed gene were selected top 75% abundantly of genes in each gene chip data set and mRNA sequencing data respectively.3.And then we were conducted overlap the abundantly expressed gene in these four data sets and as finally abundantly expressed gene.4.Survival analysis was performed to identification of candidate prognosis markers.The average gene expression level in normal samples was set as a threshold and grouped(higher than the threshold was the high expression group,and lower expression group was the reverse).According to log rank test,P < 0.01 was considered statistical significance.5.By using machine learning method,the data are divided into training set(70%)and verification set(30%).Using Cox proportional risk regression model,multi-factor analysis of training set is carried out by stepwise forward selection method,and a prediction model is established.6.Verify the effectiveness of the forecasting risk model in the verification set.According to the retrieval results of GEO database,we finally screened out five microarray datasets,include GSE8894,GSE30219,GSE37745.We extracted respectively 75,61,66 LUSC samples from the three microarray datasets.We downloaded publicly available mRNA sequencing data from TCGA database included 551 LUSC samples(49 normal and 502 primary tumor samples).and clinical information.According to the results of differential analysis of gene expression level,7925 stable differential genes were identified.In further survival analysis,36 abundantly expressed genes related to survival were identified.Finally,the 11 key candidate prognostic genes(MRPL40,GABPB1-AS1,PTPN3,SNCA,PYGB,RAP1,VDR,PHPT1,KIAA0100,TBC1D30 and CYP7B1)were identified based on the multivariate Cox’s proportional hazards regression. |