Font Size: a A A

Research On Decision Tree Algorithm Based On Rough Sets And Ensemble Learning

Posted on:2020-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:X H LiuFull Text:PDF
GTID:2428330590453155Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The decision tree algorithm is a classification algorithm which is widely used and easy to understand,and can be used to solve regression and classification problems.The typical decision tree algorithms include ID3,C4.5,CART,etc.For the attribute types of training sets,the decision tree algorithm has relatively low requirements,which can handle discrete attributes and continuous attributes,and the algorithm is efficient and easy to use.However,decision trees are prone to over-fitting,which makes the generalization ability of the training model poor.When dealing with large-scale data,it is difficult to obtain good classification results.To solve the above problems,in this thesis we will combine the rough set theory to deeply analyze and study the decision tree algorithms.We will propose a decision tree algorithm based on rough granularity and a decision tree algorithm based on harmonic granularity decision entropy.In addition,as one of the most effective algorithms in ensemble learning,the random forest algorithm has many advantages,such as less parameter adjustment,faster operation speed and better performance for avoiding over-fitting.Therefore,in this thesis we will also study the random forest algorithm,and ensemble the decision tree algorithm proposed in this thesis to obtain a better random forest classification model.The main research work of this thesis is as follows:(1)Based on the existing "harmonic average optimization-based splitting attribute selection criteria",this thesis proposes a new splitting attribute selection criterion—rough granularity,and thus constructs a decision tree algorithm DT-RGS based on rough granularity.Rough granularity is generated by averaging the knowledge granularity andthe roughness in rough set theory.During the process of constructing the decision tree,the attribute with the largest decrease of rough granularity is always selected as the splitting attribute,and the splitting attribute is used to divide the data set into multiple subsets,and each subset is iterated until the predefined termination condition is satisfied.The effectiveness of the proposed algorithm is verified by comparing the DT-RGS algorithm with the traditional decision tree algorithms on the UCI datasets.(2)The traditional ID3 algorithm has many problems,for instance,the classification result is biased toward multiple values and the amount of computation is large,etc.To solve the above problems,the concepts of rough set theory and knowledge granularity are introduced into the classification of decision tree,and a new information entropy model,harmonic granularity decision entropy,is proposed in this thesis.Based on the harmonic granularity decision entropy,we propose a decision tree algorithm DTHGE.Rough set theory is an effective tool for dealing with incomplete and uncertain data.It can effectively learn implicit knowledge from inconsistent,inaccurate and incomplete data and obtain potential rules.Through combining the roughness in rough sets with the decision tree,we can effectively measure the uncertainty of information.In addition,the knowledge granularity can effectively measure the degree of refinement of information and knowledge.Therefore,through combining roughness,knowledge granularity and decision tree algorithm at the same time,the information contained in the data set can be described more comprehensively and fully,which can improve the classification accuracy of the decision tree algorithm.Through the experiment on the UCI datasets,the DTHGE algorithm is compared with the traditional decision tree algorithm and the DT-RGS algorithm proposed in(1).It can be concluded that both the DTHGE algorithm and the DT-RGS algorithm proposed in this thesis have better performance than the traditional decision tree algorithms,and the proposed algorithms are more suitable for dealing with large-scale datasets.(3)Ensemble learning can effectively improve the generalization ability of the classification model,where the random forest algorithm has received much attention due to its superior performance.Based on the DTHGE algorithm proposed in(2),this thesis further proposes a random forest algorithm SFHGDE.First,generate N data sets from the original training set by using a random sampling(with replacement)strategy.Second,a strategy of removing poor attributes is used to filter the initial attribute set,that is,bycalculating the significance of each attribute,and the attributes whose significance are less than the given threshold are deleted,which can avoid the possibility of choosing these attributes as splitting attributes.Third,a decision tree is constructed on each training set by using the DTHGE algorithm proposed in(2),thereby obtaining N decision trees.During each process of selecting the splitting attribute,K attributes are randomly selected from the filtered attribute set,and the best splitting attribute is selected from the K attributes.Finally,the N decision trees are combined together to obtain the decision forest through voting.Experiments on the UCI data sets show that the performance of the SFHGDE algorithm is better than those of existing algorithms.
Keywords/Search Tags:decision tree, rough sets, rough granularity, harmonic granularity decision entropy, random forest
PDF Full Text Request
Related items