Font Size: a A A

Research On Parameter Optimization Algorithm Of Improved Random Forest Model

Posted on:2020-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LiFull Text:PDF
GTID:2428330623465346Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As a combined classification technology,random forest was widely used in artificial intelligence,machine learning,pattern recognition and other fields due to its strong anti-noise ability and easy parallelization.However,the sub-trees with different degrees of superiority and inferiority in the process of model establishment are voted with the same weight,which makes the accuracy and generalization performance of the model lower,and there are more hyperparameters in the model training process.The traditional Grid search The search method cannot quickly and quickly find the best combination of parameters in a targeted manner.Aiming at the above problems,an improved parameter optimization algorithm for random forest model is proposed.Firstly,by incorporating the logistic regression model into the process of subtree establishment,using the efficiency of logistic regression,the error rate is calculated on the out-of-bag data of each subtree.Secondly,the subtree with error rate exceeding a certain threshold is eliminated to speed up the model.The prediction speed,at the same time,uses the derivation relationship of the log probability to convert the error rate into the subtree weight.In the voting prediction voting of the final model,the subtree with lower error rate plays a larger role,and the subroutine with higher error rate The tree plays a minor role.Finally,the artificial fish swarm acceleration algorithm is applied to improve the random forest model parameter optimization to form a complete data classification model.The experimental results on four UCI datasets of Bank,Covtype,Credit and Connect show that compared with the original random forest algorithm,the AUC value and the comprehensive index F1 value are increased by 2.14%and 1.98%,respectively,and the average increase is 1.74% and 2.23% compared with the Adaboost algorithm.Compared with Grid search method,the time consumption of six UCI data such as Bank,Covtype,Credit,Connect,Font and Active is reduced by 25.5% on average,and the accuracy is increased by 2%~3% on average.This provides a feasible method for parameter optimization of Stochastic Forest algorithm.This dissertation has 27 figures,16 tables and 55 references.
Keywords/Search Tags:Random forest, Logical regression, Weight fusion, Parameter optimization, Artificial fish swarm acceleration algorithm
PDF Full Text Request
Related items