| Objective:1.To establish the intelligent syndrome differentiation model of non-alcoholic fatty liver disease(NAFLD)and partially realize the automatic diagnosis of TCM syndrome.2.To study the pattern of syndrome classification of NAFLD to provide reference for clinical practice.Methods:1.Recruiting participants.From December 2018 to December 2020,patients with non-alcoholic fatty liver disease were recruited in the Health Management Center of The Affiliated Hospital of Chengdu University of Traditional Chinese Medicine.Patients’ tongue and pulse manifestation data were collected by Daosheng four-diagnostic instrument.Tongue indicators include tongue color,tongue coating,tongue thickness,tongue body size,curdy fur,slimy fur,teeth marks,spotted tongue,fissured tongue,and ecchymosis.Pulse indicators include pulse depth,pulse rate,pulse shape,vacuous pulse,string-like pulse,slippery pulse,bound pulse,and intermittent pulse.Symptoms include insomnia,fatigue,low mood,dizziness,palpitation,sweating and other 60 indicators.2.Diagnosing TCM syndromes.Each patient was independently diagnosed by three TCM experts according to the syndrome differentiation standards recommended in the Consensus opinions of TCM experts on the diagnosis and treatment of non-alcoholic fatty liver disease(2017 edition).The standard divides non-alcoholic fatty liver disease into five TCM syndromes: dampness turbidity,liver depression and spleen deficiency,dampness and heat accumulation,phlegm and blood stasis,and deficiency of spleen and kidney.According to the syndrome differentiation results of the three TCM experts,the patients were divided into two types: the patients whose syndrome diagnoses of the three TCM experts were completely consistent were classified as the non-controversial type.Otherwise the patients were classified as the controversial type.3.Establishing the intelligent syndrome differentiation model.In the first step,all the included patients with non-alcoholic fatty liver disease were selected as research objects,and seven machine learning algorithms were applied: decision tree,random forest,logistic regression,K-nearest neighbor,naive Bayes,support vector machine,and XGboost.Area under curve(AUC value)and accuracy were used as evaluation indexes of model performance,and the cross-validation method was used for internal verification.The first prediction model was constructed for identifying patients with non-controversial TCM syndrome.The second step is to use the above seven machine learning algorithms to establish the second prediction model for intelligent diagnosis of TCM syndrome types of patients with non-controversial TCM syndrome.In the second prediction model,tongue and pulse manifestation and symptoms are features(predictive variables)and the TCM syndrome type is the label(outcome variable).4.Analyzing the characteristics of patients with controversial TCM syndrome.We analyzed:(1)the frequency and proportion of tongue and pulse manifestation and symptoms in this type of patients;(2)the correlation between different characteristics which are indicative for specific TCM syndrome;(3)The frequency and proportion of different TCM syndrome combinations in patients with controversial TCM syndrome.5.Studying the law of syndrome classification of nonalcoholic fatty liver disease.All the included patients with non-alcoholic fatty liver disease were selected as research objects,and cluster analysis was applied on the basis of tongue and pulse manifestation and symptom data collected by Daosheng four-diagnostic instrument.The hierarchical clustering algorithm was applied for feature selection of60 symptom features,then the K-Modes algorithm was used to cluster the NAFLD patients.By comparing the TCM characteristics among all clusters(subgroups),we named each cluster based on professional knowledge.Results:1.A total of 739 eligible cases were included in the data analysis.There were597 males(80.8%)and 142 females(19.2%),aged 46.6 ± 10.8 years,body mass index(BMI)was 27.1 ± 3.0,215 patients had smoking history(29.1%),97 patients(13.1%)complicated with hypertension,59 patients(8.0%)complicated with diabetes.There were 615 patients(83.6%)with dyslipidemia.2.There were 409 cases(55.3%)of patients with non-controversial syndrome,and 330 cases(44.7%)of patients with controversial syndrome.There were 196 cases(47.9%)with liver depression and spleen deficiency,122 cases(29.8%)with phlegm and blood stasis,53 cases(13.0%)with dampness and turbidities,29 cases(7.1%)with dampness and heat accumulation,9 cases(2.2%)with spleen and kidney deficiency.3.The average AUC values of the first prediction model(intelligent identification of patients with non-controversial syndrome)trained by different algorithms were: decision tree 0.729,random forest 0.833,logistic regression 0.815,K-nearest neighbor 0.694,naive Bayes 0.782,support vector machine 0.785,XGboost0.884.The average accuracy was 80.2% for decision tree,84.7% for random forest,84.9% for logistic regression,80.1% for K-nearest neighbor,81.6% for naive Bayes,84.3% for support vector machine,and 85.8% for XGboost.The average accuracy of the second prediction model(intelligent syndrome differentiation for non-controversial patients)trained by different algorithms was 86.0%,88.9%,87.6%,83.3%,85.4%,90.8%,XGboost 90.9%,respectively.The five AUC values obtained by XGboost model were: damp-heat accumulation syndrome 0.99,liver stagnation and spleen deficiency syndrome 0.84,dampness and turbidity syndrome 0.95,phlegm and blood stasis syndrome 0.97,spleen and kidney deficiency syndrome 0.92.4.The common symptoms of patients with controversial TCM syndrome were:fatigue,bitter mouth,dry eyes,sweating,forgetfulness,sticky stool,dry mouth,phlegm,insomnia,hot urine,heavy body,chest tightness or abdominal distention,foreign body sensation of throat,dizziness.Common tongue symptoms and pulse symptoms were: dark red or purple dark tongue,fat tongue,teeth marks,yellow fur,thick and greasy fur,weak pulse,and string-like pulse.The correlated pairs or groups included: dizziness-chest tightness or abdominal distension-heavy body,dizziness-excessive sigh-fatigue,chest tightness or abdominal distension-sticky stool,excessive sigh-heavy body,excessive sigh-dark face.The common combinations of different syndrome types were: dampness and turbidity-liver depression and spleen deficiency,dampness and heat accumulation-phlegm and blood stasis,phlegm and blood stasis-liver depression and spleen deficiency.5.The 60 symptom features were reduced to 10 by variable clustering.When the samples were clustered,the optimal number of clusters was determined as 3 by the elbow method.The three clusters formed by K-modes had statistically significant differences in tongue color,tongue shape,moss color,pulse rate,pulse momentum and other aspects(P < 0.05).The main clinical manifestations of cluster 1 patients were: no obvious symptoms,mainly with light red tongue,white thick greasy fur;The main clinical manifestations of cluster 2 patients are fatigue,mainly white or purple tongue,tooth marks,white thick greasy fur,vacuous pulse,string-like pulse;The main clinical manifestations of cluster 3 patients were fatigue,dizziness,palpitation,sweating,dry mouth,foreign body sensation of throat,depression,greasy nose and face,sticky mouth,heavy body,and burning sensation when urinating,mainly with dark red or purple dark tongue,teeth-marked tongue,yellow fur,vacuous pulse and string-like pulse.Conclusions:1.In this study,the intelligent syndrome differentiation model of NAFLD was constructed based on tongue,pulse,and symptom data collected by TCM four-diagnostic instrument and common machine learning algorithms,which verifies that the application of machine learning technology in the study of TCM intelligent syndrome differentiation is feasible.In addition,this study also verifies the feasibility of staged modeling,which provides a reference for model optimization strategy of intelligent syndrome differentiation research.2.The prediction model established by XGboost algorithm can intelligentially identify NAFLD patients with non-controversial syndrome and perform intelligent syndrome differentiation for these patients with high accuracy,better than models built by other six algorithms.This study provides experimental foundation for the automation of TCM syndrome differentiation.3.NAFLD patients with controversial TCM syndrome have diverse symptoms,suggesting that they may have several pathological factors including phlegm,dampness,heat,blood stasis,liver depression and spleen deficiency simultaneously.The pathogenesis is complicated,which leads to the difference of syndrome diagnosis among TCM experts.For such patients,it is difficult to establish intelligent syndrome differentiation model and manual syndrome differentiation is still the main method.4.Non-alcoholic fatty liver disease could be divided into three TCM syndromes:phlegm-dampness type,spleen deficiency with phlegm-dampness type,spleen deficiency with damp-heat and blood stasis type.Among them,the spleen deficiency with damp-heat and blood stasis type is a newly discovered syndrome type,which provides a new insight for TCM syndrome differentiation of NAFLD. |