| PARTI INDEPENDENTPREDICTORS AND NEW PREDICTION MODEL FOR EARLY DIAGNOSIS OF KAWASAKI DISEASE:A RETROSPECTIVE STUDY BASED ON BIG DATAObjective:Kawasaki disease(KD)is a systemic vascular inflammatory disease of unknown etiology,which occurs mainly in infants under 5 years of age.It has become the main cause of childhood-acquired heart disease in developed countries.Delayed diagnosis of KD may cause serious cardiovascular complications.Therefore,we want to establish a diagnostic model through retrospective research to help early distinguish children with KD from children with other febrile diseases(FCs),so as to timely diagnose KD patients and provide active treatment.Methods:(1)We retrospectively reviewed clinical electronic medical records of consecutive KD and FCs patientstreated in Chongqing Children’s Hospital from October 2007 to December 2017.These patients were divided into 2 groups according to their discharge diagnosis:the KD group and the FCs group.(2)UseingSQL Sever2008 to clean the original data,which mainly included patients’ demographic data,clinical characteristics,laboratory test results and imaging examination reports.(3)statistical analysis:In this study,the Mann-Whitney U test was performed for the comparison of the intergroup continuous variables;the Chi-square test was performed for the comparison of categorical variables between the two groups.A P-value<0.05 was considered to be a significant statistical difference between the two groups.Data with significant statistical difference between the two groups were selected for multivariate analysis.The LASSO logistic regression model was used to further feature selection.To identify independent predictors of KD diagnosis,we adopted multivariate logistic regression analysis with LASSO using the indicators with significant difference derived from the univariate analysis;the OR and 95%CI were calculated.The OR value was performed to determine the score of an independent risk factor and build the new prediction model.The predictive ability,sensitivity and specificity of the prediction model were evaluated using the ROC curve and the AUC.Results:(1)A total of 10367 children were collected in this study,of which 5642 were in the KD group,accounting for 54.42%of the total;the FCs group had 4,725 cases,accounting for 45.58%of the total.In addition,809 cases of incomplete KD were also collected.(2)Univariate analysis of the clinical/laboratory examination results of the two groups showed that the levels of twenty-four variables in the KD group were significantly higher than those in the FCs group(such as WBC,PLT,globulin,etc);the levels of thirty-two variables in the KD group were significantly lower than those in the FCs group(such as%MON,%LYM,phosphorus,etc).(3)In order to balance the accuracy and simplicity of the model,we chose to use LASSO constraints to obtain significant difference variables from single factor analysis,and finally identified twelve variables.Multiple logistic regression analysis is used for these twelve variables.The results of multiple logistic regression analysis can determine that the significant independent predictors of the KD group are:lower levels of%MON,phosphorus,UA,%LYM,prealbumin,AST:ALT ratio,serum chlorine and LDH;higher levels of globulin,GGT and PLT;and age.(4)Compared with the previous KD diagnosis model research,the new KD early diagnosis prediction model has a higher AUC value(0.906±0.006),sensitivity(86.0±0.9%),and specificity(80.5±1.5%).In addition,the validation data set(809 patients with incomplete KD)was used to further evaluate the effectiveness of the new model:the AUC value was 0.816.Conclusion:In this study,the KD diagnosis could be predicted using age as well as the level of%MON,GGT,%LYM,phosphorus,PLT,AST:ALT ratio,UA,serum chloride,globulin,LDH,andprealbumin.The new predictive model for early diagnosis of KD was constructed with 12 variables,and the new predictive model had better diagnostic effect than the previous model.PART Ⅱ INTEGRATING CO-CLUSTERING AND MACHINE LEARNING FOR THE PREDICTION OF INTRAVENOUS IMMUNOGLOBULIN RESISTANCE IN KAWASAKI DISEASEObjective:The first part of the study has helped clinicians diagnose KD early,but IVIG-resistance may occur during subsequent KD treatment.Therefore,a multi-classification system based on co-clustering and interpretable machine learning was proposed to identify IVIG-resistance patients and guide clinical medication to reduce adverse reactions.Methods:(1)The clinical data in Children’s Hospital of Chongqing Medical University from 2007 to 2016 were collected retrospectively.According to the response to IVIG treatment,KDpatients were divided into two groups:IVIG-responsive group and IVIG-resistancegroup.(2)Several benchmark models were implemented,including regression models(logistic regression,Lasso,and ridge regression),machine learning models(decision trees(DT),k-nearest neighbors(KNN),multinomialnaive bayes(MNB),and multilevel perceptron(MLP)),and ensemble learning methods(random forest(RF),LightgBM(GBM),XGBoST(XGB),and Explainable Boosting Machine(EBM)),in which EBM is one of the machine learning algorithms,which has higher accuracy and intelligence compared to other traditional Machine learning algorithms.These baseline methods are enhanced by the proposed framework based on co-clustering.(3)To evaluate the performance of each prediction model,five different metrics were used,including:AUC,average accuracy(AP),accuracy,recall rate and F1 value.Results:(1)A total of 3017 KDpatients were collected in this study,including 459 children with IVIG-resistance and 2558 children with IVIG-responsive,accounting for 15.21%and84.79%of all children with KD,respectively.(2)The prediction performance of the 10 machine learning algorithms was further tested by using five evaluation indexes,including AUC,average accuracy,accuracy,recall rate and F1 score.The highest values of the EBM prediction model were AUC value(0.917±0.021),average accuracy(0.835±0.022),recall rate(0.669±0.051)and F1 value(0.773±0.021).(3)The clinical characteristics of IVIG-resistant patients were identified through the EBM prediction model proposed in this study.8 high-risk factors were ranked first,namely brain natriuretic peptide(BNP),PLT,albumin,and erythrocyte sedimentation rate(ESR),hemoglobin(HB),CRP,total bilirubin(TB)and alanine aminotransferase(ALT).Conclusion:In this study,10 different machine learning algorithms were used to analyze the clinical data based on the retrospectively collected electronic medical record information,and it was found that the performance of EBM predictive model was higher than other models.The model also identified 8 risk factors.Therefore,this study helped us identify the best machine learning model to predict IVIG-resistance and proposed the importance of its function.PARTIII REAL-WORLD DATADRIVEN CLINICAL PREDICTION MODEL FOR CORONARY ARTERY LESION IN KAWASAKI DISEASEObjective:The first two parts of the research have helped clinicians diagnose and treat KD patients early,but there are still some KD patients who are prone to complicated coronary artery lesions(CALs)after early IVIG treatment.Therefore,this study is based on clinical real-world data,using machine learning algorithms to early predict whether CALs will occur 30 days after the onset of KD,and determine the risk factors related to KD with CALs,so as to guide the early clinical prevention of CALs.Methods:(1)We retrospectively collected the clinical electronic medical records of KD patients who were hospitalized and followed up in the Chongqing Children’s Hospital from January 2014 to December 2018.Patients with KD were divided into two groups according to the presence or absence of CALs at 30 days of illness:CALs group and NCALs group.(2)Univariate analysis was conducted for all clinical data of the CALs and NCALs.Three conditions were considered for analysis:1)CALs on admission were not considered;2)The patient had CALs before admission for IVIG treatment;3)The patient did not develop CALs before IVIG treatment.And then in view of the above three kinds of different situations,the classification algorithms were used to establish multiple prediction models in the experiment,including logistic regression,MNB,MLP,RF,SVM,XGB and EBM.In order to evaluate the performance of each established prediction model,four different indicators were used in this study,including AUC,sensitivity,specificity and accuracy respectively.Results:(1)A total of 2089 cases of children were collected.Among them,there were 682 cases in the CALs group,accounting for 32.65%;In the NCALS group,there are 1407 cases,accounting for 67.35%of the total,(2)Univariate analysis of the clinical/laboratory examination results of the two groups showed the variables with significant statistical differences could be divided into the following three conditions:1)Regardless of whether patients were admitted to the hospital with CALs,the variables include male,age,CRP,ESR,etc.(P<0.005);2)Patients were considered to have complicated CALs before admission for IVIG treatment,and the variables include male,CRP,ESR,albumin,etc.(P<0.005);3)Patients have not complicated CALs before admission for IVIG treatment,and the variables include Male,CRP,PCT,ALT,etc.(P<0.005).(3)A variety of prediction models were constructed based on the results of the above three kinds of single-factor analysis,and it was found that the RF prediction model had the highest value in AUC value(0.872),specificity(0.832)and accuracy(0.813)compared with other models.Conclusion:In this study,the early prediction model of KD complicated with CALs was constructed by machine learning method,and the risk factors of KD complicated with CALs were found in different groups,and the performance of RF prediction model was the best. |