| At present,cancer has become the second leading cause of death in human beings,and the number of newly diagnosed cancers is increasing.Meanwhile,cancer patients show a trend of younger in age.Studies have found that at least 22 types of cancer belong to genetic diseases,and even 58% of melanoma patients are related to heredity.So the genetic level of analysis,for cancer diagnosis and treatment,are very important.At present,the commonly used gene analysis methods at home and abroad include differential expression analysis,feature extraction,feature selection,survival analysis and GO enrichment analysis.In this study,the gene expression data of 467 skin malignant melanoma tissues and clinical data in the TCGA database will be combined with the gene expression data of 500 healthy skin tissues in the GTEx database for correlation analysis.1.Differential gene expression analysisDifferential gene expression analysis will be performed based on the gene expression data of 467 skin malignant melanoma tissues and 500 healthy skin tissues,using DESeq2.The results showed that there were 13507 genes with significant difference factor greater than 2 times,and even 2575 genes with differential expression of more than 10 times,among which 1047 genes were up-regulated and1528 genes were down-regulated.The results indicated that the gene expression difference between malignant melanoma tissue and healthy skin tissue was very large.2.The filter and wrapper feature selection methods were used to choose the feature genes.(1)Filter feature selectionFirstly,filtered feature selection method was used for gene preliminary screening:Spearman correlation coefficient was used to remove genes unrelated to skin malignant melanoma in the first step.Under the conditions of significance test P<0.1and Spearman correlation coefficient greater than 0.4,13566 genes were retained.Since the effect of dimension reduction in the first step was not obvious,the m RMR algorithm was used in the second part and the threshold was set to 500,that is,500 genes were retained.(2)Wrapper feature selectionIn this paper,three wrapper feature selection algorithms,including Random Forest recursive feature elimination(RF-RFE),Treebag recursive feature elimination(Treebag-RFE)and Random forest Simulated Annealing(RF-SA),were used to screen feature genes.The three methods retained 30 genes,57 genes and 103 genes respectively.(3)Classification ability evaluationThe classification effect of nonlinear support vector machine based on Gaussian kernel function is mainly used for comparison.The evaluation indexes include classification accuracy,Precision,Recall,F-measure and AUC.Comparison of classification results showed that the 30 characteristic genes screened by RF-RFE had the lowest number of genes and the best classification effect,so it was finally decided to retain these 30 characteristic genes for survival analysis.3.Combined with the clinical data of patients with cutaneous malignant melanoma,30 characteristic genes were used as covariables for survival analysis.(1)Cox proportional risk regressionUnivariate Cox regression analysis was conducted for the 30 characteristic genes.The significance P value was set to 0.1 when the PH hypothesis was met.Nine genes,including CITED,AP1S2,USP11,SDC3,SNX10,EN2,EOMES,CHST11 and FOXRED2,were cited respectively.(2)K-M survival analysisK-M survival analysis was performed on 9 genes significantly related to patient survival obtained by Cox regression analysis,and K-M survival curves were drawn.Finally,SNX10,AP1S2,EN2 and USP11 with high and low gene expression had a significant impact on patient survival.The purpose of this study was to find out the genes related to cutaneous malignant melanoma by analyzing the gene expression of cutaneous malignant melanoma tissue,so as to be used in the diagnosis of cutaneous malignant melanoma.Survival analysis was performed to obtain the genes significantly associated with the survival of patients,so as to be used in the treatment and prognosis of patients with cutaneous malignant melanoma. |