Font Size: a A A

An Early Warning Model Based On Stacking Integrated Machine Learning For Cervical Cancer

Posted on:2022-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:L SunFull Text:PDF
GTID:2504306533962409Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
OBJECTIVE Cervical cancer is one of the most common malignancies in women worldwide and has a high early survival rate.Therefore,routine cervical screening can identify the disease earlier and reduce the incidence and mortality of cervical cancer.However,underdeveloped areas usually do not have sufficient medical resources for screening.Therefore,this study aims to develop an early warning model based on demographic,behavioral,and clinical factors,which can be used to identify women at high risk of developing cervical cancer in order to optimize cervical screening strategies in underdeveloped regions and make better use of limited medical resources.METHOD Machine learning plays an important role in developing predictive models.However,overfitting is a common problem in machine learning,especially when data is limited or missing.Stacking Integrated Learning(SIML)is an advanced maximum likelihood learning technique that combines multiple learning algorithms to improve predictive performance.This study uses data from 858 women screened for cervical cancer in Venezuelan hospitals from the UCI public database to develop the SIML algorithm.This data was filled by random forest and feature selection to construct an expert dataset,which was then randomly divided into training data for algorithm development(80%)and test data for algorithm validation(20%).Random forest models and one-way logistic regression were used to screen predictive features for cervical cancer.Random forest models based on five class imbalance treatments were constructed on the training set,and the optimal class imbalance treatment among them was selected;after that,the optimal class imbalance treated data were used for the training of 12 machine learning models(Tree Bag,RF,Xgboost,Ada Boost,SGB,Mon MLP,Reg Logistic,SLDA KNN,LMT,Gauss Pr Radial,SVMRadial),and the prediction performance of the 12 models are compared in the validation set.By comprehensive evaluation of the performance of the models(entropy method)and correlation test,the model with good prediction effect and weak correlation is selected as the base model,and LMT is used as the resulting classifier to combine Stacking integrated structure;finally,the performance of Stacking integrated models with different tuning parameters is compared in the external validation set to select the model with practical value.RESULT The random forest model identified 18 characteristics that predicted the occurrence of cervical cancer,and hormonal contraceptive use was considered the most important1 factor,followed by the number of pregnancies,years of smoking,and the number of sexual partners.Among the random forest models based on 5 class imbalances,SMOTE was selected as the method to solve the data class imbalance treatment,and the SMOTE processed data were used to model 12 machine learning algorithms,and the final LMT-Stacking model with LMT as the outcome classifier and Tree Bag,Mon MLP,and Xgboost as the base classifier were the most effective for cervical cancer The best prediction was achieved in the high-risk population:in the validation set,the statistical indicators of the "LMT Stacking_1" model were 0.818 sensitivity,0.819 specificities,0.368 F1 value,0.230 F2 value,and 0877 AUC." The statistical indicators of the "LMT Stacking_2" model were 0.909 for sensitivity,0.781 for specificity,0.357 for F1,0.223 for F2,and 0.876 for AUC.CONCLUSION This study shows that SIML can be used to accurately identify women at high risk of developing cervical cancer.The model can use data derived from interrogative or electronic medical record modalities,such as demographics,behavioral patterns,and historical clinical data,to optimize screening intervals and care plans for personalized screening.
Keywords/Search Tags:Machine learning, Cervical cancer, Risk warning, Personalized screening
PDF Full Text Request
Related items