Font Size: a A A

Research Of Decision Tree Algorithm Based On Rough Sets And Gray Theory

Posted on:2011-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:X M LiuFull Text:PDF
GTID:2178360305961231Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the practical application of decision tree classification algorithm, the data sets often has missing attribute value or more redundant attributes, and the existing branch attribute selection method of decision tree is easy to generate too many rules, which often leads to generate a larger decision tree.Therefore, It possesses important theoretic and practical significance to make further improvement of decision tree algorithm, make it more suitable for data mining application requirements.The attribute missing values filling,attribute reduction and the choice of decision tree branch attributes are researched in this paper. Firstly, for the attribute missing values filling method based on the K nearest neighbor algorithm without considering whether it will lead to data conflict, and need several attempts to select K value, but, this may not get the optimal value,and some filling method fill the attribute missing values in the entire data set, this may lead to serious errors in practice. In response to these deficiencies, this paper combines gray theory and rough sets theory to fill the attribute missing values to generate the GRFill Algorithm, and realizes average filling method and the nearest neighbor method based on Euclidean distance filling methed, compares GRFill Algorithm with above two kinds filling method by the decision tree performance of C4.5 algorithm. Secondly, this paper improves the attributes reduct algorithm based on discernability matrix for its shortcomings that high degree of time and space complexity, realizes RSredu attribute reduction algorithm that reduces the redundant attributes and improves the decision tree performance. Thirdly, this paper defines the classification relationship between condition attributes and decision attribute based on rough sets theory, and gets the RDTree(data, C, D, stop) algorithm which uses the consistency between condition attributes and decision attribute as the branch attribute selection criteria.Experimental results shows that the RMSE of GRFill algorithm is less than the average filling method and the nearest neighbor algorithm based on the Euclidean distance. classification accuracy rate is higher than the average filling method and the nearest neighbor algorithm based on the Euclidean distance; the size of the decision tree is reduced by doing the decision tree classification after RSredu attributes reducts; the leaf number and total nodes number of decision tree that RDTree (data, C, D, stop) algorithm generates is less than the C4.5 algorithm, classification accuracy rate and the average running time is almost equal to C4.5 algorithm. Lastly, this paper combines above three studies to get the combinatorial optimization RGDTree(data, C, D, stop) decision tree classification algorithm, and realizes it on the WEKA platform, uses the standard UCI data sets and data sample of FoodMart2000 database to validate the RGDTree(data,C,D,stop) algorithm classification performance, Experimental results indicates that the research of this paper is benefit to improve the performance of decision tree classifiers.
Keywords/Search Tags:Decision Tree, Attribute missing values filling, Attribute Reduct, Rough Sets, Gray Theory
PDF Full Text Request
Related items