The Improved Random Forests Based On The Imbalanced Data Classification

Posted on:2018-12-30

Degree:Master

Type:Thesis

Country:China

Candidate:Z T Wei

Full Text:PDF

GTID:2348330521950290

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

Random forest algorithm as a widespread classification algorithm is essentially a combined classifier.Single classifiers often encounter bottlenecks in dealing with classification problems,but combining these single classifiers with ideas of integration would get good results.The essence of the algorithm is to randomly draw a number of training samples using the Bootstrap re-sampling method,and then use these training samples to construct a set of tree classifiers and then use this sets to classify by voting.The bootstrap sampling method of random forest algorithm leads to the integration of training samples into invalid sample sets,When the algorithm dealing with imbalanced data.This situation is not helpful in dealing with the problem of imbalanced data.And we all know that in the random forest algorithm the status of the decision tree is the equal.Above situations affects the final voting result,that reduce the classification performance of the algorithm.In this paper,an improved method of bootstrap resampling is presented which help us solve above problem.The quality of the training sample sets is guaranteed by the threshold value based on the non-equilibrium coefficient.Then we can get a better set of decision trees that make the voting results more accurate.Finally random forest algorithm can better deal with imbalanced data classification problems.The randomness of the bootstrap sampling method will produce the problem of different classification performance of different decision tree.As a combinatorial classifier,the random forest algorithm combines the decision tree with the voting algorithm.However,the voting rules of random forest algorithm do not take into account the difference between the basic classifie.This problem leads to poor classification results.We will weighted the decision tree through the non-equilibrium coefficient.We will obtain a variety of weighted random forests algorithms based on non-equilibrium coefficients.The new algorithm will help us improve the classification performance.The experimental data are based on twelve non-equilibrium dichotomy classification from the KEEL dataset warehouse.The imbalance Ratios of these datasets are distributed in the range of 1.25 to 42.By the experiments,it can prove that the above two improvements can enhance,to some extent,the quality of classified problem in dealing with imbalanced date by the use of Random forest algorithm.The secondary improvement on the basis of the first improvement can further improve the quality of the algorithm.

Keywords/Search Tags:

imbalanced data sets, Random forest, new Bootstrap sampling, Weighted decision tree

PDF Full Text Request

Related items

1	Research On Decision Tree Classification Method Of Imbalanced Data Based On Reinforcement Learning
2	Research And Application Of High Dimensional Imbalanced Data Classification Based On Random Forest
3	Research On Decision Tree Algorithm Based On Rough Sets And Ensemble Learning
4	Research For Imbalanced Big Data Classification Algorithm On Random Forest
5	Research On Imbalanced Data Classification Based On Voronoi Diagram
6	Research On Imbalanced Data Classification Method Based On Random Forest Algorithm
7	Class-Imbalanced Data Stream Classification Method Based On Adaptive Random Forest
8	Classification Learning Of Imbalanced Data Sets Based On Sampling Processing
9	Imbalanced Data Classification And Its Application In The Prediction Of The Mobile Phone Replacement
10	Research On The Method Of Solving Imbalanced Classification Problems Based On Random Forest Algorithm