Font Size: a A A

Decision Tree Based On Bayes' Theorem To Extract Exception Rules

Posted on:2004-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:K J YuanFull Text:PDF
GTID:2208360092496673Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Classification discovery is an important task in Data Mining. Classification needs to construct a model(also called classifier)which makes a map of records in Database with a peculiar class label. There are many ways for constructing classifier. Decision Trees are widely applied in data mining for its character such as simplicity and concision compared with other classification models.Magnanimity data is commonly faced in data mining. Traditional algorithm solves scalability performance ability. That is to say, how to make algorithm effective in large dataset is same to small dataset. This measure main idea is reduce primary occupation in order to improve the efficiency of algorithm. Of course, accuracy is also important in classification. Traditional algorithm only makes decision tree more deep or bushy in order to improve accuracy. This leads to the final tree is hard to comprehend and difficulty to covert into rules.In this paper, we begin with accuracy of decision trees. We redefine tree nodes of traditional tree and define the concept that is called majority leaf nodes. We call those class labels as majority class leaf nodes whose percentage of any class distribution is large than the assumed threshold value. Traditional algorithm terminates here, and then marks the related nodded with the most common class label, which makes accuracy drop down. For example, those records that belong to Majority Class Leaf Nodes marked with Minority class label will be appointed as the common class label when being classified by this model. That will make the classification accuracy fall down.We extract exception rule in the majority leaf nodes by referring to statistics view. Although it has high accuracy against other algorithm, this method requires much time. Because of this, we define three data structures: Attribute Value-Class ListN Exception Ratio Table> Exception Ratio Table Group to simplify computation. At the same time we boost the efficiency by heuristic measure.To get a much higher efficiency, we adopt a pre-pruning measure which defines a split threshold. When the threshold value is less than the predicted value, it shows that the related nodes approach to purity. Then we stopped splitting, for further splitting nodes has little significance. This avoid high cost by using post-pruning measure which require many times for scanning disk data and amount of CPU time. So we gain a high efficiency.At last, we get a new algorithm named Gen_DT_ER( ) which equipped with exception rule extraction and pre-pruning method by referring totypical algorithm IDS. We evaluate error rate, scalability, time, tree nodes numbers by 12-Cross Validation method. Experiment has demonstrated that new algorithm greatly reduces the error rate and has good scalability at the same time. Time and the numbers of tree nodes are also reasonable. This new idea can be applied for any other algorithm. Thus it produces a new way for improving classifying algorithm accuracy.
Keywords/Search Tags:majority class leaf nodes, exception rule, pruning, 12-cross validation
PDF Full Text Request
Related items