Font Size: a A A

The Application Of Decision Tree Model In MDS And AA Differential Diagnosis

Posted on:2017-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LvFull Text:PDF
GTID:2284330503991993Subject:Public Health and Preventive Medicine
Abstract/Summary:PDF Full Text Request
Objectives To establish reliable classification rules for guiding or assisting clinicians in differential diagnosis of myelodysplastic syndrome(MDS) and aplastic anemia(AA) using decision tree models.Methods A cohort of 780 subjects with clinically established MDS and AA diagnosis was selected among 2000 cases of patients treating at the Chinese Academy of Medical Sciences Hematonosis Hospital. For each subject, 203 items of patient information, including basic information such as name, age, occupation, nationality and hematological indexes, associated virus index, serum markers and different parameters obtained from blood smear, bone marrow smear, immunological, flow cytometry and stem cell colony culture tests and other indicators. The collected information was entered in Epi Data 3.1 database system and translated into an ―Excel File‖ which was used for subsequent analysis of C5.0, CART and QUEST decision tree models generated via SPSS Modeler 14.1 software package. To compare the prediction accuracy of the three models, we calculated the average accuracy rate, the accuracy rate, the sensitivity, the F1 measure, the specificity and the Youden’s index. Finally, combinations of the three decision tree models were performed in order to select the best combination model and compare the best single decision tree model in that combination.Results There was a significant age difference between patients constituting MDS and AA groups(χ2=47.411, P<0.001). Sex and ethnic composition were similar in both groups(P>0.05). According to the occupational structure of patients, workers, farmers and students were the three main groups affected by the diseases and there was a significant difference in occupational composition in both patient groups(χ2=39.063, P<0.001). The three selected decision tree models showed potential applicability in the identification and classification of both diseases. The prediction accuracy of C5.0, CART or QUEST were 78.12%, 73.75% and 76.88% and there was no significant difference between the three models(P>0.05). The average per-class accuracy in C5.0, QUEST and CART models were 77.08%, 75.44% and 73.48% respectively. For MDS, the positive cases in this study, prediction precision rate in C5.0, CART and QUEST models were respectively 76.24%, 76.14% and 73.83% without significant difference between the three models(P>0.05). The prediction sensitivity rate of samples were 89.77%, 87.50% and 76.14%, respectively, and presented significant difference between these three models(χ2=7.161, P<0.05). F1 measure in C5.0, QUEST and CART models were 81.48%, 81.03% and 76.14%, respectively. For AA, the negative case in this study, specificity in C5.0, CART and QUEST models were respectively 66.67%, 70.83%, 61.11% and there was no significant difference in specificity between these three models(P>0.05). Youden’s index in C5.0, QUEST and CART models were respectively 0.54, 0.51 and 0.47 with the highest one recorded in C5.0. In combined decision tree models, prediction accuracy of C5.0 and QUEST combined model reached 80%. However, there was no significant difference between combined models(P>0.05). After comparison, there were no significant difference in precision, sensitivity or specificity between combined models, but C5.0+QUEST combined model was the best in composite indicators, like average per-class accuracy, F1 measure and Youden’s index. The values of these indicators were 79.17%, 82.80%, 0.58 respectively and were all improved comparing with C5.0 single decision tree model. The root node of the decision tree model chosen from the three single decision tree models were all percentages of bone marrow mature lymphocytes tested by flow cytometry.Conclusions In this study, the prediction accuracy in C5.0, CART and QUEST models are all very high. C5.0 is the best classification model with the highest composite indicators. C5.0+QUEST combined model was impoved in composite indicators compared to C5.0 single decision tree model and is the best model which could assist doctors in differential diagnosis of MDS and AA. Lymphocytes percentage which tested by bone marrow flow cytometry and primitive granulocyte count are very important variables to differentiate MDS and AA.
Keywords/Search Tags:myelodysplastic syndrome, aplastic anemia, decision tree, differential diagnosis
PDF Full Text Request
Related items