Font Size: a A A

Research On Model Decision Tree Method

Posted on:2020-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:R YinFull Text:PDF
GTID:2428330578969046Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid development of the information age makes it easier to collect and transmit data,and the data scale also shows the trend of exponential growth.Such a huge amount of data contains great value,so the analysis and utilization of big data is particularly important.Data classification is an important task in the field of machine learning,such as spam recognition,image recognition,face recognition,voice recognition and so on.Decision Tree(DT)has been widely used in classification problems with its excellent data analysis efficiency and easy-to-understand output results.However,since the decision tree is constructed by recursive method,the training efficiency is low when the data size is large,and over-classification of the decision tree may lead to overfitting.Therefore,the study of efficient decision tree construction algorithm still has important application value.This thesis conducts research on the above issues,including:(1)Propose model decision tree method.To solve the problem that recursive construction of Decision Tree makes algorithm time longer and efficiency lower,this thesis proposes a Model Decision Tree(MDT)algorithm.The MDT algorithm uses gini index to generate an incomplete decision tree in the training data set,and then uses a simple classification model to classify the impure pseudo leaf nodes(they are neither leaf nodes nor belonging to the same class),and then generates the final decision tree.Compared with DT,the Model Decision Tree can improve the training efficiency of the Decision Tree with little or no loss of precision.(2)Propose model decision forest method.Although the Model Decision Tree can improve the training efficiency of the Decision Tree algorithm,the accuracy of the Model Decision Tree decreases with the increase of the impure pseudo leaf nodes' size.Therefore,a Model Decision Forest(MDF)algorithm was proposed to solve the problem of low accuracy of the Model Decision tree algorithm.The MDF algorithm takes the Model Decision Tree as the base classifier and uses the idea of random forest to generate multiple model decision trees.The algorithm first obtains different sample subsets through rotation matrix,then trains multiple different model decision trees on these sample subsets,and then integrates these trees by voting.Experimental results on benchmark data sets show that the proposed Model Decision Forest is superior to the Model Decision Tree algorithm in classification accuracy.In this thesis,the problem of long training time and low training efficiency caused by recursion in the construction of classical decision tree algorithm is studied,and two new decision tree models are proposed,which can not only improve the efficiency of tree construction,but also have certain antioverfitting ability.The research results of this thesis enrich the research of decision tree algorithm and have good application potential.
Keywords/Search Tags:Decision Tree, Decision Forest, MDT, MDF, Gini index
PDF Full Text Request
Related items