Decision Tree Based On Bayes' Theorem To Extract Exception Rules

Posted on:2004-08-27

Degree:Master

Type:Thesis

Country:China

Candidate:K J Yuan

Full Text:PDF

GTID:2208360092496673

Subject:Computer applications

Abstract/Summary:

PDF Full Text Request

Classification discovery is an important task in Data Mining. Classification needs to construct a model(also called classifier)which makes a map of records in Database with a peculiar class label. There are many ways for constructing classifier. Decision Trees are widely applied in data mining for its character such as simplicity and concision compared with other classification models.Magnanimity data is commonly faced in data mining. Traditional algorithm solves scalability performance ability. That is to say, how to make algorithm effective in large dataset is same to small dataset. This measure main idea is reduce primary occupation in order to improve the efficiency of algorithm. Of course, accuracy is also important in classification. Traditional algorithm only makes decision tree more deep or bushy in order to improve accuracy. This leads to the final tree is hard to comprehend and difficulty to covert into rules.In this paper, we begin with accuracy of decision trees. We redefine tree nodes of traditional tree and define the concept that is called majority leaf nodes. We call those class labels as majority class leaf nodes whose percentage of any class distribution is large than the assumed threshold value. Traditional algorithm terminates here, and then marks the related nodded with the most common class label, which makes accuracy drop down. For example, those records that belong to Majority Class Leaf Nodes marked with Minority class label will be appointed as the common class label when being classified by this model. That will make the classification accuracy fall down.We extract exception rule in the majority leaf nodes by referring to statistics view. Although it has high accuracy against other algorithm, this method requires much time. Because of this, we define three data structures: Attribute Value-Class ListN Exception Ratio Table> Exception Ratio Table Group to simplify computation. At the same time we boost the efficiency by heuristic measure.To get a much higher efficiency, we adopt a pre-pruning measure which defines a split threshold. When the threshold value is less than the predicted value, it shows that the related nodes approach to purity. Then we stopped splitting, for further splitting nodes has little significance. This avoid high cost by using post-pruning measure which require many times for scanning disk data and amount of CPU time. So we gain a high efficiency.At last, we get a new algorithm named Gen_DT_ER( ) which equipped with exception rule extraction and pre-pruning method by referring totypical algorithm IDS. We evaluate error rate, scalability, time, tree nodes numbers by 12-Cross Validation method. Experiment has demonstrated that new algorithm greatly reduces the error rate and has good scalability at the same time. Time and the numbers of tree nodes are also reasonable. This new idea can be applied for any other algorithm. Thus it produces a new way for improving classifying algorithm accuracy.

Keywords/Search Tags:

majority class leaf nodes, exception rule, pruning, 12-cross validation

PDF Full Text Request

Related items

1	Research And Application On The Exception Rule Mining Algorithm
2	Inductive Decision Tree Classification Model In The Military Transport Vehicle Management System
3	Scan Tree Design To Optimaze The Number Of TSVs And Leaf Nodes For 3D-ICs
4	Research Of Fast Association Rule Mining Method Based On Equivalence Class Transformation
5	Study On Web-Pages Classification Based On Rough Set And "Rule+Exception"
6	Design And Implement Of Exception Handling In Workflow Management System
7	The Research Of Ensemble Pruning Method For Imbalanced Data
8	Design And Study On Robotic Gripper With The Skeleton Of Changing Cross-section Leaf Spring
9	The Research Of PKI Cross-Domain Bridge Trust Model Based On Validation Agent
10	Research On Block-regularized Cross-Validation Methods For Comparing Supervised Algorithms