Objective:In this study,the factors affecting the prognosis of patients with different genders of primary lung cancer were studied.Screen the important characteristics of the prognosis of patients with primary lung cancer of different genders.And the prognosis of lung cancer patients of different genders was modeled to provide a basis for individualized treatment decisions.Methods:This article collected 36039 clinical data of patients with primary lung cancer from the National Cancer Institute Surveillance,Epidemiology,and Outcome Database(Surveillance,Epidemiology and End Results,SEER).The demographic and clinical characteristics of lung cancer patients of different genders were analyzed.At the same time,Cox regression model was used to analyze the prognostic factors of lung cancer patients of different genders.Use Logistic Regression(LR),Naive Bayes(NBayes,BN),Decision Tree(DTree,CART),Support Vector Machine(SVM),K-nearest neighbor(KNN),Random forest(RF),Adaptive boosting(Adaptive Boosting,Ada Boost)and e Xtreme Gradient boosting algorithm(e Xtreme Gradient Boosting,XGBoost)eight classifiers to model the prognosis of lung cancer patients of different genders.The model with higher accuracy was selected for the next analysis.Use XGBoost algorithm and RF algorithm for feature screening to determine the optimal feature set.Results:1.One-year,three-year,and five-year overall survival for female patients were91.1%,75.5% and 63.9%,respectively,while those for male patients were 84.2%,63.3%and 51.2%.Among the cancer-specific survival,the one-year,three-year,and five-year cancer-specific survival for male patients were 87.9%,71.5%,and 63.1%,respectively,while those for female patients were 93.2%,81.0%,and 72.8%,respectively.2.In male patients,the XGBoost classifier achieved the highest accuracy in predicting the one-year overall survival(Acc = 0.8535)and five-year overall survival(Acc = 0.7438).Similarly,the XGBoost classifier also achieved the highest accuracy in predicting the one-year cancer-specific survival(Acc = 0.8857)and three-year cancerspecific survival(Acc = 0.7246).Among female lung cancer patients,the XGBoost classifier achieved the highest accuracy in predicting the three-year overall survival(Acc = 0.7462),five-year overall survival(Acc = 0.6885),one-year caner-specific survival(Acc = 0.9331)and five-year cancer-specific survival(Acc = 0.7068).3.Among men,in the prediction of the one-year overall survival,three-year overall survival,five-year overall survival and the one-year cancer-specific survival,the prediction results of the best feature set obtained by the XGBoost model were 0.03%,0.35%,0.04% and 0.04% higher than the prediction results of the best feature set obtained by the RF model,respectively.4.In female samples,the prediction results of the best feature set obtained by the XGBoost model in the one-year overall survival,three-year cancer-specific survival and five-year cancer-specific survival prediction were 0.09%,0.02% and 0.05% higher than the prediction results of the best feature set obtained by the RF model,respectively.In the prediction of five-year overall survival and one-year cancer-specific survival,the number of features in the best feature set obtained by the XGBoost model was 4 and 2fewer than the number of features in the best feature set obtained by the RF model,respectively.5.When predicting the one-year,three-year and five-year overall survival and oneyear,three-year and five-year cancer-specific survival of the male sample,the number of features in the best feature set obtained by XGBoost were 13(Acc = 0.8536),9(Acc= 0.6800),12(Acc = 0.7437),11(Acc = 0.8863),15(Acc = 0.7267)and 12(Acc =0.7229),respectively.When the one-year,three-year and five-year overall survival and one-year,three-year and five-year cancer-specific survival of female samples were predicted,the number of features of the best feature set obtained by XGBoost were 7(Acc = 0.9141)and 14(Acc = 0.7468),respectively,11(Acc = 0.6896),8(Acc =0.9331),11(Acc = 0.7929)and 13(Acc = 0.7084).Conclusions:1.In primary lung cancer,women’s one-year,three-year,and five-year overall survival and one-year,three-year,and five-year cancer-specific survival were better than those of men.2.One-year overall survival,three-year overall survival prediction and one-year cancer-specific survival prediction for male and female patients with primary lung cancer,the number of features in the best feature set obtained by the XGBoost algorithm was less than that of the RF,and the accuracy was higher.3.Using the XGBoost algorithm for modeling,in the prediction of the one-year overall survival,three-year overall survival,one-year cancer-specific survival and three-year cancer-specific survival of male and female lung cancer patients,the prediction results of female lung cancer patients were better than that of men.While in the five-year overall survival and five-year cancer-specific survival prediction,the prediction results of male lung cancer patients were better than women.4.In the prognostic prediction analysis of male and female patients with primary lung cancer,the best feature set obtained by the XGBoost algorithm was different.Therefore,it was necessary to establish a prognostic model for primary lung cancer of different genders. |