| Objective Based on the theory of syndrome elements differentiation and the idea of TCM syndrome hierarchical diagnosis,the TCM syndrome hierarchical diagnosis criteria for clinical patients with lung cancer is proposed.On the basis of this criteria,combined with machine learning algorithm,the quantitative hierarchical diagnosis model of TCM syndromes of lung cancer was constructed and evaluated,so as to provide clinicians and researchers with a reliable and easy-to-use diagnostic model.It can be an aid to realize the standardization and quantitative diagnosis of TCM syndromes of lung cancer,and provide new ideas and methodological references for the quantitative diagnosis of TCM syndromes.Methods ① The literature related to TCM syndromes of lung cancer included in the Chinese academic journal database of CNKI,Wanfang and VP is searched.The publication time is ranged from January 1,2000 to January 1,2022.After screening according to the inclusion and exclusion criteria,the syndrome diagnosis in previous clinical studies of lung cancer is divided at the level of syndrome elements and counted by frequency according to Syndrome Elements Differentiation to analyze the distribution of disease and location syndrome factors in patients with lung cancer.② The medical records of patients with primary lung cancer are collected from the outpatient and inpatient departments of Oncology Department of Guang ’anmen Hospital,China Academy of Chinese Medical Sciences.The distribution of symptoms from lung cancer patients is analyzed,and the hierarchical clustering algorithm and association rule algorithm are used to analyze the information,and the clinical syndrome characteristics of lung cancer patients are explored.Combined with the results of literature statistics and clinical data analysis,the hierarchical diagnosis structure of TCM syndromes for lung cancer patients is proposed.③According to the hierarchical structure,the diagnosis criteria of those syndromes are extracted from Diagnostics of Traditional Chinese Medicine,Principles of Clinical Research Guidelines for New Chinese Medicine,Terms of Clinical Diagnosis and Treatment of Traditional Chinese Medicine Part 2:Syndromes(2020 edition),Syndrome Differentiation and Diagnosis of Traditional Chinese Medicine,Syndrome Elements Differentiation and the terminologies are standardized.Based on the results of comparative analysis between the literatures with expert opinions,the TCM syndrome hierarchical diagnostic criteria of primary lung cancer is formed.④ The medical records of patients with primary lung cancer are collected from the outpatient and inpatient departments of the Oncology Department of Guang ’anmen Hospital,China Academy of Chinese Medical Sciences.The included medical records are labeled with syndromes according to the diagnostic criteria established above.MLKNN,random forest and deep neural network are used to construct multi-label classification models for syndrome differentiation of lung cancer.The performance of the three models is evaluated and compared by the classification accuracy of the overall sample,the miF1 value,the maFl value,the Hamming loss,the Jaccard similarity coefficient and the F1 value of each syndrome label.⑤The model with the best performance is taken as the research result,and the SHAP algorithm is used for explanatory analysis of the model.Taking qi deficiency syndrome,qi stagnation syndrome,phlegm syndrome,solid heat syndrome and lung qi inverse syndrome as examples,the behavior logic of the model to achieve quantitative diagnosis of syndromes are explained,and the rationality and reliability of the syndrome differentiation process has been proved.Results ① A total of 1149 literatures are obtained according to the search strategy,and 95 literatures are included in this study after screening according to the inclusion and exclusion criteria.The pathogenic syndrome elements and location elements are divided and analyzed,and a total of 12 pathogenic elements are found.The most common pathogenic elements are qi deficiency syndrome and Yin deficiency syndrome,followed by phlegm syndrome,dampness syndrome,blood stasis syndrome,etc.A total of 9 location elements are obtained,mainly in the lung,and closely related to the spleen,kidney,stomach and other viscera.The analysis of the relationship between the pathogenic and the location elements shows that the lung tends to have deficiency and excess syndrome.The kidney mainly has deficiency syndrome.The stomach and the liver mainly have excess syndrome,and the gallbladder,bladder and large intestine are often combined with damp and heat syndromes.② After screening by the inclusion and exclusion criteria,a total of 514 medical records of primary lung cancer patients are included.After standardized processing of medical records,208 items of four-diagnostic information of lung cancer patients are extracted,including 181 symptoms,18 tongue appearance,and 9 pulse appearance.A total of 26 categories are obtained by hierarchical clustering algorithm.After manual interpretation,it is believed that these categories involve 41 syndromes,including 12 pathogenic syndrome elements,namely qi deficiency,blood deficiency,Yin deficiency,Yang deficiency,fluid deficiency,deficiency heat,qi inverse,qi stagnation,blood stasis,phlegm and solid heat.There are 9 disease location syndrome factors,including lung,heart,spleen,liver,kidney,stomach,gallbladder,large intestine,chest and flank.According to the above research results,the hierarchical diagnosis structure of TCM syndromes for patients with lung cancer is proposed,which is divided into four levels.The first-level diagnosis is based on the principle of deficiency syndrome and excess syndrome,and the second-level diagnosis is divided by the surface deficiency syndrome,the interior deficiency syndrome,the surface excess syndrome,and the interior excess syndrome.Finally,combined with the specific disease location or subcategory,it becomes the four-level diagnosis.③ Based on literature review and expert opinions,the TCM syndrome stratification diagnostic criteria for lung cancer is formed.④After screening by inclusion and exclusion criteria,a total of 1213 medical records of primary lung cancer patients are included.Three machine learning algorithms are used to construct the multi-label classification models of lung cancer syndrome differentiation.The accuracy of MLKNN is 0.22,and miF1 is 0.72.There are 7 syndromes with F1 value above 0.8,and the first one is lung qi inverse syndrome,with F1 value of 0.95.The accuracy of the random forest model is 0.30,the miFl value is 0.71,and there are 10 syndrome labels with F1 value above 0.8.The best one is lung qi inverse syndrome,with F1 value of 0.99.The accuracy of the deep neural network model is 0.45,miF1 is 0.87,and there are 19 syndrome labels with F1 value above 0.8.The best one is lung qi inverse syndrome,with F1 value of 0.99.⑤ The explanatory analysis of the deep neural network model by SHAP algorithm shows that shortness of breath,fatigue,spontaneous sweating,and pulse weakness have higher weight for the diagnosis of qi deficiency syndrome;For the diagnosis of phlegm syndrome,depression,head and body distress,cough yellow phlegm,cough white phlegm,and cough phlegm have higher weight.For the diagnosis of qi stagnation syndrome,chest tightness,stomach fullness,bitter mouth,narrow pulse,and distension and tightness of the hypochondrium have higher weight.For the diagnosis of solid heat syndrome,red tongue,bitter mouth and dry mouth have higher weight.For the diagnosis of lung qi inverse syndrome,expectoration,shortness of breath,wind and cough have higher weight.Conclusions ① Primary lung cancer shows combined deficiency and excess pathogenic elements of syndrome characteristic.The deficiency syndrome mainly includes qi deficiency and Yin deficiency,while the excess syndrome mainly includes qi inverse,qi stagnation,phlegm and solid heat,which is consistent with the cognition of the pathogenesis of lung cancer in TCM theory.The location syndrome elements show that the lung is the core of the disease,and it is widely involved in multiple viscera,especially closely related to the spleen and stomach.② Based on the theory of syndrome elements and the idea of syndrome hierarchical diagnosis,the TCM syndrome hierarchical diagnosis criteria for lung cancer can accurately and flexibly describe the syndromes of patients with lung cancer.Because of its characteristics of clear diagnostic rules,unified terminology,and reasonable level division.It can be used as a semiquantitative tool to assist clinical and scientific researchers in TCM syndrome differentiation of lung cancer.③ The deep learning model based on neural network algorithm has the best performance on the medical record dataset of lung cancer patients provided by this study.It can not only successfully complete the multi-label classification learning task,but also shows a relatively accurate classification performance on 19 syndrome labels under the limitation of the small size of the dataset.It can be used as a standardized and quantitative research tool for TCM syndrome differentiation in lung cancer patients.④ Taking the model as the analysis target,the explanation results of the five syndrome labels by SHAP algorithm are basically consistent with the pre-formed syndrome differentiation criteria,and the discrepancies can also be explained by the relevant prior knowledge.Therefore,it can be concluded that the syndrome differentiation logic of the model is in line with the theoretical cognition of TCM and has good reliability. |