| Atherosclerosis is the main cause of cardiovascular disease and can lead to serious consequences such as heart disease and stroke.To effectively evaluate the risk of athroscleros-is in patients and improve the accuracy and interpretability of risk assessment,this study developed a small-sample and interpretable prediction model for atherosclerosis risk based on real clinical data.The research consists of the following three parts:(1)For small-sample data,we improved the synthetic minority oversampling technique(SMOTE)to generate more stable and effective new samples and proposed the K-Means boundary oversampling technique(K-Means BS).Experimental analysis showed that the dataset synthesized by K-Means BS can effectively avoid boundary and duplicate sample problems.(2)We proposed the triple weighted gc Forest algorithm(TW-gc Forest),which adds sliding window attention and forest attention to the standard gc Forest algorithm,allowing the deep forest to extract more valuable multi-level features for deeper information mining.By comparing with other machine learning models,we found that TW-gc Forest has the best performance.The accuracy,precision,recall,F1 score,and AUC of TW-gc Forest are0.9624,0.9528,0.9711,0.9619,and 0.9933,respectively,all of which are superior to the standard gc Forest algorithm.(3)To provide personalized interpretability of atherosclerosis risk,we used the Extreme Gradient Boosting algorithm(XGBoost)and Shapley additive operation(SHAP).We random-ly selected two patients with high risk and two control group patients for detailed personalized risk factor analysis,and explored the feature dependencies of the two pairs for atherosclerosis risk assessment.Experimental analysis showed that this method can provide sample-level interpretability.The results of this study indicate that small-sample and interpretable methods can be widely applied in atherosclerosis research and provide valuable information to guide clinical treatment and prevention. |