Font Size: a A A

Identification Of Metastatic Genes In Osteosarcoma And Development Of Survival Predictive Model Based On Machine Learning

Posted on:2022-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhouFull Text:PDF
GTID:2480306554983849Subject:Surgery
Abstract/Summary:PDF Full Text Request
Purposes:Identify the metastatic and prognostic gene signatures in patients with osteosarcoma,and investigate the efficiency of survival predictive models developed on machine learning.Methods:1.Obtained clinical survival data and gene expression data of osteosarcoma cases from the TARGET database.The patients were divided into metastatic group and non-metastatic group according to their status at the time of initial diagnosis.The data was cleared out,which was contained both gene expression data and clinical survival data simultaneously.Patients were divided into training group and testing group with the method of random number table according to the ratio of 7:3.Kaplan-Meier method and log-rank test were used to compare the difference in survival rate between the two groups.The Differentially Expressed Genes(DEGs)were identified between the metastatic group and the non-metastatic group,and univariate Cox survival analysis was used to selected the DEGs with relation of survival(P < 0.05).2.Integrated the DEGs selected from univariate Cox survival analysis into the training group,and developed Lasso-Cox proportional hazard regression,Support Vector Machines(SVM),Random Forest(RF),and Extreme gradient boosting(XGboost)four machine learning models,and validated internally the predictive efficiency of the four novel models in the testing group.Evaluating the predictive efficiency of the four novel models by calculating the accuracy rate and the Area Under the Receiver Operating Characteristic Curve(AUROC).Results:1.A total of 272 cases of osteosarcoma were included,68 cases in the metastatic group,with an average age of 14.5 ± 4.06 years;204 cases in the non-metastatic group,with an average age of 17.1 ± 10.69 years.The Log-rank test results showed that survival rate between the metastatic group and the non-metastatic group was significantly different(P <0.05).The data was cleaned up,and 95 cases contained both gene expression data and clinical survival data.63 cases in the training group,with an average age of 14.72 ± 4.47 years;32 cases in the testing group,with an average age of 16.72 ± 6.56 years.There was no significant difference in survival rate and survival status between the two groups(P> 0.05).There was a total of 371 DEGs between the metastatic group and the non-metastatic group,with 267 high-expressed genes and 104 low-expressed genes.A total of 37 significant genes was selected by univariate Cox survival analysis(P <0.05).2.Ten genes were selected by Lasso regression analysis,and multivariate Cox regression analysis combined and established a five-gene(MYC,FAM166 B,SOSTDC1,FMO2,and BPIFB1)Lasso-Cox proportional hazard regression model.The ROC curves of the training group and the testing group showed that the AUROC values for survival time of 1-year,3-years,and 5-years were 0.807 and 0.742,0.848 and 0.645,0.879 and 0.660,respectively,and the C index was 0.770 and 0.671,respectively.The SVM model was developed based on 6 genes(ARX,SOSTDC1,TSPY4,ITK,ALDH1A1,and RBMY1B)identified by the support vector machine recursive feature elimination method.The accuracy rates of the training group and testing group were 0.746 and 0.594,and the AUROC value was 0.585.The top five genes with characteristic variables in the RF model are FMO2,TSPY9 P,DDN,ALDH1A1,and MYC.The accuracy rates of the training group and testing group are0.762 and 0.750,respectively,and the AUROC value is 0.709.The top five genes with characteristic variables in the XGboost model are FMO2,DDN,TSPY1,ABCA4,and MYC.The accuracy rates of the training group and testing group are 0.714 and 0.750,respectively,and the AUROC value is 0.700.Both MYC and FMO2 genes were both included in the Lasso-Cox model,and ranked in the top five in the ranking of the importance of feature variables in the RF model and the XGboost model.Conclusions:1.Among the osteosarcoma survival predictive models developed by machine learning,the RF model and the XGboost model have good and similar predictive efficiency,which is better than the Lasso-Cox proportional hazard regression model,while the predictive efficiency of the SVM model is not great.2.MYC and FMO2 genes might be related to the metastasis and prognosis of osteosarcoma.3.The prognosis of patients with metastatic osteosarcoma is worse than that of patients without metastasis.
Keywords/Search Tags:Osteosarcoma, Differentially Expressed Genes, Machine Learning, Survival Predictive Model
PDF Full Text Request
Related items