Font Size: a A A

Decision Tree And Its Application In Medicine

Posted on:2005-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:L XuFull Text:PDF
GTID:2144360125468394Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
The common inquiry mechanism of database management system and traditional statistical methods are far from the needs,with the rapid development of information techniques, and the continuous increase of data from all kinds of database,. As a result, data-mining comes out. As one of the algorithms used by data-mining, decision tree is a induction learning one, top-down, no-backing, continuously searching important splitting variables. Its basic purpose is constructing simple, readable tree from some irregular cases by certain rule. Its central technique is constructing and pruning the tree. Decision tree qualifies many advantages that other statistical and machine learning methods cannot meet, as well as, acquiring quantities of information from huge data.This article centered on applying the decision tree into the real medical data, the Traditional Chinese Medicine Syndrome in chronic gastritis and following-up data after operation of liver cancer. Mainly, this research made good use of decision tree, overcame defects of traditional statistical methods, explored the application of decision tree in classification and prediction in medicine, provided a better clue for analyzing complicated medical data.For the data of TCM, target variable was 5 syndromes, and the sample was 406.Bootstrap was used to enlarge the sample to 2000,in order to meet the application condition of data-mining. Then tree was constructed based on information gain ratio using SAS Enterprise Miner software. The result was, selecting 33 important variables, and getting 129 decision rules. The proportion correctly classified of the model was high, training set 83.90%, validation set 79.84%, test set 80.75%. When using the model to classify the new samples, it showed good sensitivity and specificity in the 5 syndromes. Syndrome of 1 is 85.59% and 94.86%, of 2 is 64.71% and 93.37%, of 3 is 81.43% and 96.28%, of 4 is 89.19% and 97.65%, of 5 is 69.88% and 92.82%.Research on syndrome is a hot and difficult point in theoretical field of TCM. There is no objective criterion in discriminating different syndromes, and no objective indexes to value the factors. Research of this article showed good results: selecting important variables and ranked their importance; producing a set of comprehensible and applicable rules; building probabilistic model, which was used to diagnose the syndrome. All those valued most in clinical application.In the part of studying liver cancer patients after operation, decision trees of C4.5 and CART were used to compare with Logistic regression. First, selected 11 variables by using 2selection criterion, R-square (cutoff=0.005) and Chi-square (cutoff=3.84).Then, built tree models based on information gain ratio and Gini index, Logistic regression model on logit function. The result was as follows: proportion correctly classified of C4.5 test set was 80.76%, higher than 77.32% of CART and 70.45% of Logistic regression. And its area under ROC curve was the largest. Further more, decision tree produced some rules that can be used into predicting recurrence of liver cancer. There were so many affecting factors and relations between them were complex. Compared to Logistic regression, tree showed superiority in analyzing the data. And C4.5 tree might be much more appropriate than CART for this analyzing.It can be seen that decision tree is good at classification and prediction.1. It can efficiently deal with missing data.2. It can discretize the interval variables using its algorithm's rationale.3. It can deal with different types of variables at the same time.4. It trains fast and has high classifying efficiency.5. It can select important variables and rank them.6. It can produce simple and easy to understand rules.7. It can build models based on probability.With the deeper application of decision tree, it is to exhibit important application values and broad developing prospects.
Keywords/Search Tags:data-mining, decision tree, classification of syndrome, prognostic prediction
PDF Full Text Request
Related items