Font Size: a A A

Research On Optimization And Improvement Of Random Forests Algorithm

Posted on:2017-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:L MaFull Text:PDF
GTID:2348330503967001Subject:Science, applied mathematics
Abstract/Summary:PDF Full Text Request
Random forests algorithm is a kind of classification algorithm which has good universality, wide range application and is not easy to over-fit. But there are still some shortages in random forests algorithm what needs to be improved. This paper introduces the theory and research status of classification algorithm and random forests algorithm. On the basis of above, we put forward some improved algorithm, the Specificity work as follows:(1) A variety of classification algorithm is studied, and we choose the UCI data to simulate. After summarizing the comprehensive evaluation indexes, we compare the result of logistic regression, Naive Bayesian, neural network, support vector machine and random forests algorithm, the superiority of random forests algorithm is verified.(2) Aim at unbalanced data processing and the disadvantages of SMOTE, we propose CURE-SMOTE algorithm. We experiment on unbalanced UCI data, compare classification results of the original data, random sampling, Borederline-SMOTE1, safe level-SMOTE, C-SMOTE, Kmeans-SMOTE sample. We found that the proposed algorithm is closer to the original data distribution, and has minimal noise and the better classification effect, proves the algorithm is effective and feasible.(3) Feature selection and parameters are the key factors which influence the performance of algorithm. We propose the intelligent algorithm used for feature selection and parameter optimization mixture algorithm based on random forests. By binary encoding, the tree, the number of attributes and feature selection are searched at the same time, the minimum out of bag data error as the objective function. We choose the binary and higher dimensional data and contrast the result of traditional values without feature selection, and hybrid genetic random forests, hybrid particle swarm random forests and hybrid fish swarm random forests. The values of F, G-mean, AUC and OOB error show that the algorithm can improve the performance of the random forests. This algorithm provides a new way for feature selection and parameter optimization.
Keywords/Search Tags:Random Forests, Imbalance Data, Intelligence Algorithm, Feature Selection, Parameter Optimization
PDF Full Text Request
Related items