Font Size: a A A

Research Of Decision Tree Optimizing And Association Rule Mining Algorithms

Posted on:2011-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y LiFull Text:PDF
GTID:2178360302999231Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data mining is a key research topic in artificial intelligence and machine learning. At present, the researches about relevant mining approach have focused on feature extraction, attribute reduction, algorithm efficiency, classification accuracy, improvement and application of relevant methods in the specific areas and so on. In many data mining methods, association rules mining and decision tree classification methods have the characters of short running time, light computation and easily understandable results, so these two methods have a bright future the aspects of theoretical research and practical application. Therefore, based on in-depth analysis on the existing algorithms, this paper proposes the corresponding improved algorithms. The comparing experiments use the UCI datasets, and the improved algorithms achieve good results. The main work includes the following three aspects:(1) The traditional frequent itemset mining algorithm generates a large number of short patterns set, but in fact, users really take interests in rules generated by long pattern set. Therefore, this paper proposes a frequent closed itemsets mining algorithm based on the antecedent-consequent constraint and the length-decreasing support constraint. Experimental results show that this algorithm greatly reduces the number of frequent itemsets and higher efficiency.(2) The existing decision tree algorithms have the following shortcomings:the attribute selection is difficult; the algorithm is easy to be interfered by noise data and the generalizing ability is low. Variable precision rough set based decision tree construction algorithm has better classification results, and can tolerate the noise data. Therefore, this paper analyzes the existing variable precision rough set based decision tree algorithm. For their deficiencies, a new attribute selection criterion is proposed, that is, attribute significance. This criterion comprehensively considers the weighted approximation accuracy, the information gain and the number of attribute values in the current node. Decision tree algorithm based on the attribute significance (CGVPRSDT) can improve the classification accuracy. (3) This paper analyzes the disadvantages of the existing multi-valued and multi-labeled decision tree algorithms. A novel decision tree algorithm for multi-valued and multi-labeled data is proposed. In the algorithm, a new formula calculating similarity based on the non-noise label-sets is proposed. The formula comprehensively considers the similarity between two label-sets as well as the impact of the noise data. At the same time, the algorithm improves the conditions of the corresponding node to stop splitting. The comparison experiments prove that the improved decision tree algorithm for multi-valued and multi-labeled data not only can reduce the impact of noise data on the value of similarity, but also has the higher predictive accuracy.
Keywords/Search Tags:Data Mining, Variable Precision Rough Set, Decision Tree, Multi-labeled Data, Association Rule
PDF Full Text Request
Related items