| Objectives:Depression is a widespread and increasingly severe psychological disease nowadays.It not only damages the physical and mental health of patients themselves but also brings a heavy burden to their families and the whole society.At present,the pathogenesis of the disease is not fully understood,but it has been linked with complex genetic,epigenetic and environmental interactions.For example,many studies have found that the methylation level of tryptophan hydroxylase-2(TPH2)gene is related to depression.Our research developed three classification models(logistic regression,BP neural network and support vector machine)to(1)Identify patients with depression(2)Predict depression patients’prognosis.Our research aims to select the most appropriate model for depression-related predictions and provide clinicians with some references about model selection.Methods:According to the inclusion and exclusion criteria,a total of 291 depression patients and 100healthy people were enrolled in the research in Zhong Da Hospital,Nanjing.We collected their general demographic information,scores of life events,scores of childhood trauma questionnaire and methylation levels of 38 specific sites of TPH2 gene.Chi-square test,t-test,rank-based nonparametric test and stepwise logistic regression were conducted to select the factors that may help and prognosis of depression.We used 10-fold cross-validation to validate the generalization abilities of models.The performance of models was evaluated by sensitivity,specificity,accuracy,positive predictive value,G-mean,F-measure,receiver operating characteristic curve and area under the curve.All the analyses were performed using R 3.6.2.Results:According to the purpose of this study,the results were divided into the following two parts:(1)Identifying patients with depressionA total of 16 variables were selected to identify depression patients,including gender,the total score of negative life events,the total score of childhood trauma questionnaire and methylation level of 13 sites(TPH2_11_86,TPH2_11_121,TPH2_11_154,TPH2_3_92,TPH2_4_156,TPH2_5_203,TPH2_7_54,TPH2_7_184,TPH2_8_106,TPH2_9_117,TPH2_9_142,TPH2_9_160,TPH2_9_178).The cut-off points were identified with the largest Youden’s Index.The results showed that the best number of hidden nodes for BP neural network is 2.After 10-fold cross-validation,logistic regression:Se 0.653,Sp 0.840,PV_+0.922,ACC 0.701,G-mean 0.741,F-measure 0.765,AUC 0.802.BPNN:Se0.900,Sp 0.800,PV_+0.929,ACC 0.875,G-mean 0.849,F-measure 0.914,AUC 0.875.For support vector machine,the results of Radial Basis Function(RBF)are better than those of the other three kernel functions.The optimal parameter combination of RBF is cost=5,gamma=0.5,10-fold cross-validation:Se 0.900,Sp 0.920,PV_+0.970,ACC 0.905,G-mean 0.910,F-measure 0.934,AUC 0.956.(2)Predicting depression patients’prognosisA total of 15 variables were selected to predict the prognosis of depression patients,including gender,age,the total score of negative life events,the total score of childhood trauma questionnaire,partner,age of the first onset,onset frequency and methylation level of 8 sites(TPH2_1_154,TPH2_2_139,TPH2_2_217,TPH2_5_203,TPH2_7_142,TPH2_7_170,TPH2_8_237,TPH2_9_134).The cut-off points were identified with the largest Youden’s Index.The results showed that the best number of hidden nodes for BP neural network is 2.After 10-fold cross-validation,logistic regression:Se 0.661,Sp 0.586,PV_+0.721,ACC 0.632,G-mean 0.622,F-measure 0.690,AUC 0.619.BPNN:Se0.417,Sp 0.838,PV_+0.806,ACC 0.577,G-mean 0.591,F-measure 0.549,AUC 0.638.For support vector machine,the results of Radial Basis Function(RBF)are better than those of the other three kernel functions.The optimal parameter combination of RBF is cost=1,gamma=3.5,10-fold cross-validation:Se 0.906,Sp 0.946,PV_+0.964,ACC 0.921,G-mean 0.926,F-measure 0.934,AUC 0.970.Conclusion:In identifying depression patients and predicting the prognosis of depression,the ascending order of the three models’performance is the same,i.e.logistic regression<BP neural network<support vector machine.Among them,support vector machine based on radial basis kernel function has the best performance and is significantly better than the other models.It can be concluded that support vector machine based on radial basis function is better than the other models in dealing with classification problems in this research.It also suggests that it is possible to identify and predict the prognosis of depression patients by integrating basic individual information,environmental stress level and methylation level of TPH2,which can provide some ideas and model selection references for similar research in the future.For the recognition of depression patients,13 methylated sites were selected.For predicting the prognosis of depression,8 methylated sites were selected.It suggested that the methylation level of these sites can be used as a specific biomarker to identify and predict the prognosis of patients with depression,which can provide experience and reference for the selection of sites in related studies in the future. |