BackgroundRetinopathy of prematurity(ROP)is a proliferative disease of the retinal vessels that occurs in preterm and low birth weight infants and is the most common blinding and low vision eye disease in infants and children.timely screening and treatment are important to reduce the blindness and visual impairment of ROP.At present,the current domestic and international standards for ROP screening are mainly based on gestational age and birth mass,which are relatively lenient and the screening efficiency is low.At the same time,fundus screening equipment is expensive,requires high medical and nursing work experience,and is invasive.Too many tests not only put great pressure on the medical system and social families,but may not be beneficial for preterm infants.In the face of the increasing number of surviving preterm infants and the lack of specialized ophthalmology equipment and physicians,how to reasonably assess the possibility of ROP and minimize the amount of screening without missing children is an urgent problem to be solved.Machine learning is a hot topic in medicine and has a wide range of applications in ophthalmology.We hope to find a new method to predict the occurrence of ROP by combining machine learning with the prediction of ROP.Purpose:1 、 Analyze potential risk factors for ROP and find the best combination of indicators that can predict the occurrence of ROP2、Predictive model for the risk of retinopathy of prematurity development based on machine learning methodMethods:The data of preterm infants in our clinical research database were retrieved,and the clinical data of 642 preterm infants(126 children with onset of ROP and 516 preterm infants without ROP)were extracted,and divided into training and testing sets according to the ratio of 4:1.Machine learning-based extreme gradient boosting tree(XGBoost),Random Forest(RF),Support Vector Machine(SVM),Adaptive Boosting(Ada Boost),Complement Naive Bayes(CNB),and Light Gradient Boosting Machine(Light GBM)were used to construct prediction models for the risk of ROP,respectively.and the area under the subject operating characteristic curve(AUC)was applied to compare the predictive value of the models constructed by the above six algorithms for the risk of ROP.The models with the best predictive performance were selected,and the prediction results of the machine learning models were visualized and interpreted by the SHAP method.Results:1.Among the models generated by the six machine learning algorithms,the model generated by Extreme Gradient Boosting Tree(XGBoost)has the highest AUC performance(0.949)in the training set(0.96)and validation set(0.96),and the prediction is also the most efficient.2.History of severe preeclampsia,birth score(Apgar)1 min,gestational age at birth,history of very low birth weight infants,history of blood transfusion,and history of neonatal hyperglycemia were the candidate predictors for the construction of the XGBoost model.3.SHAP summary chart analysis showed that birth score(Apgar)1 min,gestational age at birth,history of very low birth weight,history of blood transfusion,history of neonatal hyperglycemia were risk factors for the development of ROP,and a history of severe preeclampsia in the pregnant mother had a positive effect on the development of ROP.Conclusion:The XGBoost model constructed based on a machine learning method with a history of severe preeclampsia,birth score of 1 min,birth gestational age,very low birth weight infants,history of blood transfusion,and history of neonatal hyperglycemia has a good predictive value for ROP,or the model can be used for clinical screening of people at high risk of ROP. |