| Diabetic Kidney Disease(DKD)is a common complication of diabetic,which is extremely harmful to patients.Generally,machine learning methods based on DKD Electronic Medical Records will only build a single global model on all data,but the global model is unfair when it comes to predicting heterogeneous subgroups in patients.Subgroup research is a feasible solution to this problem.However,the most subgroup research field currently focuses more on knowledge discovery of the obtained subgroups,and often ignores the research on the prediction performance of subgroups.In response to the above problems,this paper proposes a subgroup-modeling based on Decision Trees and Transfer learning(SDTT)which can improves the overall predictive performance by identifying heterogeneous subgroups and then modeling the subgroups individually to form a subgroup model library.For the classic decision tree algorithm,SDTT has the following improvements:1)SDTT’s grouping decision tree takes maximizing the recall performance after node segmentation as the optimization goal,so that the decision tree tends to divide the subgroup with the greatest improvement in recall performance.2)SDTT introduces transfer learning in the process of training the grouping decision tree and building the subgroup model library.This method solves the problem that the prediction performance of the sub-node model is degraded due to the rapid decrease of the node sample size after the node is split into sub-nodes in the classical decision tree algorithm.3)SDTT solves the problem that the structure of the grouping decision tree is unstable in repeated experiments by repeatedly selecting split points and voting when nodes are split.The data for this study were 11,559 adults with type 2 diabetes in a partner medical center.In order to further improve the interpretability and rationality of subgroups,this study combines expert knowledge with data-driven,and uses the candidate attributes selected by expert knowledge to participate in SDTT grouping.The experiment finally obtained 5 subgroups.After modeling the subgroups separately,the overall Recall performance weighted by the subgroup sample size ratio as a weight was improved by 0.0338 compared with the global model.The SDTT framework proposed in this study effectively solves the unfair problem of the global model in the face of heterogeneous subdivision groups,and combines expert knowledge with artificial intelligence.Interpretation of highly heterogeneous subgroups.These subgroups can aid clinical practice. |