Font Size: a A A

Research On Diagnosis And Prediction Of Prostate Cancer Based On XGBoost Algorithm

Posted on:2021-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:M Q LiFull Text:PDF
GTID:2404330620971635Subject:Computer technology
Abstract/Summary:PDF Full Text Request
At present,early clinical screening for prostate tumors mainly depends on prostate specific antigen(PSA)examination,but the sensitivity and specificity of PSA diagnosis of prostate tumors are not ideal.In this paper,machine learning algorithms and data mining methods are used to analyze prostate cancer data sets from the Clinical Medical Science Data Center(Beijing 301 Hospital),and combine PSA with blood routine,biochemical and urine routines to screen for prostate tumors to confirm the risk factors of prostate cancer.Firstly,the data is processed through the random forest model with missing values.Then statistical analysis is used to analyze single features and combined features;the results of statistical analysis are used to combine clinically significant features to improve the generalization ability of the model.Build a model by selecting features with high correlation with the target by Pearson correlation coefficient,At the same time,the data set is up-sampled by the SMOTE algorithm to solve the problem of sample imbalance;the pre-processed sample set is used as the training and test samples of the prediction model.The prediction models are models based on the random forest algorithm,AdaBoost algorithm,and XGBoost algorithm.The evaluation is performed through indicators such as recall,accuracy,f1-score,and ROC curve.Based on the comprehensive evaluation of the experimental results and the confusion matrix,a prostate tumor prediction model based on the XGBoost algorithm is constructed.The recall rate and the accuracy are 0.98 and0.91.The importance and clinical significance of characteristics such as the ratio of PSA(free),PSA(total),inorganic phosphorus,PSA(free),and apolipoprotein E are given,and different inputs are explored through experimental comparison.The number of features is used to predict the changes of indicators based on the XGBoost model,thereby providing a scientific basis for optimizing clinical diagnostic data.This paper proposes theapplication of the XGBoost model based on SMOTE processing to the diagnosis and prediction of prostate cancer.Through comparative experiments,the characteristics and clinical significance of diagnosing prostate cancer are discovered.
Keywords/Search Tags:SMOTE, prostate cancer, XGBoost algorithm, feature combination, data cleaning
PDF Full Text Request
Related items