Font Size: a A A

Research On Incremental Learning Algorithm Of Decision Tree For Intelligence Large Data

Posted on:2018-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:J SunFull Text:PDF
GTID:2348330542990795Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Decision tree is one of the most popular classification methods because of its advantages of easy comprehension.However,the decision tree constructed by existed methods is usually too large and complicated,so in some applications the practicability is limited.And the data is often increasing,if using traditional decision tree induction algorithm,then the historical data and the new data will be put together to learn again whenever gaining a new batch of samples,this approach will lead to the previous knowledge is forgotten,so that the decision tree which learned before does not make any sense.Therefore,incremental learning becomes particularly important at this time,that is,using new samples to update the decision tree.In this paper,a new decision tree algorithm named NOLCDT is proposed based on the study of decision trees and incremental learning methods.Before splitting the node,the NOLCDT algorithm merges the multiple attribute values of each candidate attribute in this node into two groups.Selecting the candidate attribute with the largest information gain divides the node into two branches,which avoids generating too many branches,and thus prevent the decision tree is too complex.The NOLCDT algorithm also improves on the selection of the next node to be split,which computes the corresponding nodal splitting measure for all candidate splits,and always selects the node which has largest splitting metric from all candidate split nodes as the next split node,so that each split has the greatest information gain.In addition,based on ID5 R,an improved algorithm IID5 R is proposed to evaluates the quality of classification attributes and estimates a minimum number of steps for which these attributes are guaranteed such a selection.Combining the NOLCDT with IID5 R algorithm,an improved hybrid classifier algorithm HCS is proposed.HCS algorithm consists of two phases: building initial decision tree and incremental learning.The initial decision tree is established according to the NOLCDT algorithm,and then the incremental learning is performed with IID5 R.HCS takes advantage of the decision tree and the incremental learning method,which is easy to understand and suitable for incremental learning.The contrast experiment between the traditional decision tree algorithm and HCS algorithm with UCI data set is proposed,the experimental results show that HCS can solve the increment problem very well.The decision tree is simpler so that it is easy to understand and the incremental phase consumes less time.
Keywords/Search Tags:Classification, Data Mining, Decision Tree, Incremental Learning
PDF Full Text Request
Related items