| Disease diagnosis is a fundamental issue in medical research and clinic practice.Different from modern medical disease diagnosis system,Traditional Chinese Medicine(TCM)diagnosis is based on syndromes,on which individualized clinical diagnosis and treatment modes of syndrome differentiation and treatment are formed.The phenotypic information based on symptoms and signs is the main basis for the diagnosis of syndromes,and it is of great value for TCM clinical diagnosis and basic research to discover the similarities and differences between Chinese and western diagnostic phenotypes,explore disease subtypes combining diseases and syndromes,and establish a classification system of TCM syndromes diagnosis.In this paper,we analyze the similarities and differences of phenotypic characteristics of Chinese and western medicine diagnosis by integrating the clinical and basic data of Chinese and western medicine.On the basis,combining the specific chronic diseases(liver diseases),diseases subtyping combined with diseases and syndromes,and the TCM syndrome diagnosis method based on machine learning are studied respectively.Therefore,the regularity and automatic classification methods of TCM diagnosis phenotypes are explored,which provide the basis for the research of TCM auxiliary diagnosis.Specifically,the main work of this article is as follows:Firstly,on the issue of the similarities and differences between Chinese and western medicine diagnosis,we construct a symptom-based TCM syndrome and western disease diagnosis similarity network based on the national diagnostic criteria and PubMed literature data respectively.Through comparative analysis of a variety of network topology characteristics,it is found that the degree and betweenness of TCM syndromes are bigger than those of the western disease network,suggesting that the confusion between TCM syndromes diagnosis is relatively large,and there is a high degree of phenotypic connotation and mechanism diversification;the homogeneity within the community is lower,highlighting the multi-dimensional characteristics of TCM syndromes.The assortative mixing is the common feature of Chinese and western medicine diagnosis network,indicating the common phenotypic heterogeneity of medical diagnosis.Secondly,for the problem of disease subtyping based on the combination of disease and syndrome,through the integration and processing of 6475 cases of hospitalized liver disease data in Hubei Province Traditional Chinese Medicine Hospital,we construct a patient similarity network based on symptoms and signs in the electronic medical records.Subsequently,the subtypes and specific diseases,herbs,TCM syndromes and other clinical manifestations are discovered by network community detection and statistical tests such as Chi-square test and relative risk.It is found that among the 303 subtypes obtained,the top 6 large human subtypes and 3 medium-sized human subtypes show consistent clinical characteristics and molecular network characteristics,verifying the feasibility and clinical value of disease subtypes combined with disease and syndrome.Finally,we focus on the diagnosis algorithm for TCM syndromes based on the two aspects of single-label and multi-label classification.For the diagnosis based on single-label classification,logistic regression,SVM,random forest and neural network combined with feature selection are applied respectively.The best performance of F1 for the liver depression and spleen deficiency syndrome attains 0.8260.In addition,with the method of feature representation of deep learning,the classification performance has been improved,and F1 can reach 0.8550.It embodies the advantages of feature selections and deep learning on the performance of TCM syndromes diagnosis.For the diagnosis based on multi-label classification,the top eight syndromes with the most patients are selected for oneVSrestClassifier(oneVSrest),BinaryRelevance(BR),MLkNN,LabelPowerset(LP)and ClassifierChain(CC),micro F1 can reach up to 0.7057.And after splitting the syndrome,micro F1 can reach a maximum of 0.7587.BR and LP showed good performance in the process of syndromes diagnosis. |