Font Size: a A A

Improved Algorithm And Application Of CART Decision Tree Based On GA

Posted on:2021-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiuFull Text:PDF
GTID:2438330611953985Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Since the continuous development of Internet technology in the middle of the 20 th century,information technology has followed rapidly develop,and the users have generated a large amount of images,text,audio,video and other information anytime,anywhere.How to find useful information to people from these growing data? Thus,data mining technology was born.Data mining is to constantly find useful information in the data,establish the connection between various mathematical models and data through various analysis methods and analysis tools,and then analyze and predict these constructed models and data.Classification prediction is an important field in data mining.In data mining,classification prediction occupies an important position.The process is to find the characteristics between attributes in known data for classification,so as to predict some unknown data.Decision tree algorithm is an easy-to-understand and wide-ranging algorithm in classification.Compared with other methods,decision tree has the advantages of fast prediction rate,high accuracy and easy generation of classification rules,so it is commonly used in classification prediction method.The commonly used decision tree algorithms are ID3 algorithm,C4.5 algorithm and CART algorithm.The ID3 algorithm is suitable for processing small-scale data sets and cannot handle discrete attributes.The C4.5 algorithm improves the shortcomings of the ID3 algorithm.It can handle continuous attributes at the same time,and it adds preliminary regularization ideas when pruning,to prevent overfitting.However,the C4.5 algorithm can only handle classification problems and cannot handle regression problems.The CART algorithm has been improved on this problem.It can handle both classification and regression problems,and Gini coefficients are used to split the information gain rate to reduce the amount of data calculation.But the CART algorithm also has its own shortcomings.The CART algorithm uses dichotomy to divide.The biggest drawback of dichotomy is local optimization.The starting point of this paper is to solve the problem of dichotomy local optimization.When the numbers of feature attributes in the data set is too large,the time spent on training the model is greater,and the trained model will be more complicated,so the promotion ability of the model will also decrease.It is confirmed by experiments that using genetic algorithm to find the best feature building can greatly improve the classification accuracy.In the process of constructing the decision tree in this paper,the CART algorithm is used.In most cases,the decision tree model constructed by CART is more accurate than themodels constructed by other algorithms,and when the sample is larger,the data volume is more complicated,and the variables are more.The more significant the effect of the algorithm.But the CART algorithm also has its own shortcomings.The CART algorithm is split by the dichotomy,but the biggest drawback of the dichotomy is local optimization.Each calculation can only find the current optimal value in this step,and it is easy to fall into local convergence.Genetic algorithm is one of the global optimal search algorithms,the process is to find the optimal individual through continuous selection,crossover and mutation operations.The innovation of this paper is to use the characteristics of global optimization of genetic algorithm,find the optimal split point through genetic algorithm,and optimize the CART algorithm.Because of their excellent performance,genetic algorithms are widely used in optimization problems.Genetic algorithm is more mature in finding the optimal classification rules,and in decision tree algorithms,the classification rules are ultimately obtained in essence.See,it is also feasible to improve the decision tree through genetic algorithm.Although the genetic algorithm cannot guarantee a 100% optimal in theory,it also provides the possibility of optimization,and subsequent experiments have also proved that using the genetic algorithm instead of the dichotomy to find the optimal split point can improve the classification accuracy.
Keywords/Search Tags:Data Mining, CART Algorithm, Decision Tree, Genetic Algorithm
PDF Full Text Request
Related items