An empirical study of random forests for mining imbalanced data

Posted on:2008-09-08

Degree:M.S

Type:Thesis

University:Florida Atlantic University

Candidate:Golawala, Moiz M

Full Text:PDF

GTID:2448390005472125

Subject:Computer Science

Abstract/Summary:

Skewed or imbalanced data presents a significant problem for many standard learners which focus on optimizing the overall classification accuracy. When the class distribution is skewed, priority is given to classifying examples from the majority class, at the expense of the often more important minority class. The random forest (RF) classification algorithm, which is a relatively new learner with appealing theoretical properties, has received almost no attention in the context of skewed datasets. This work presents a comprehensive suite of experimentation evaluating the effectiveness of random forests for learning from imbalanced data. Reasonable parameter settings (for the Weka implementation) for ensemble size and number of random features selected are determined through experimentation on 10 datasets. Further, the application of seven different data sampling techniques that are common methods for handling imbalanced data, in conjunction with RF, is also assessed. Finally, RF is benchmarked against 10 other commonly-used machine learning algorithms, and is shown to provide very strong performance. A total of 35 imbalanced datasets are used, and over one million classifiers are constructed in this work.

Keywords/Search Tags:

Imbalanced data, Random, Class

Related items

1	Research For Imbalanced Big Data Classification Algorithm On Random Forest
2	Class-Imbalanced Data Stream Classification Method Based On Adaptive Random Forest
3	Improvement Of Preprocessing Technology And Algorithm On Multi-class Imbalanced Data Set
4	Class-imbalanced Learning Based On Data Smoothing
5	Research Of Multi-class Imbalanced Data Classification Method
6	The Improved Random Forests Based On The Imbalanced Data Classification
7	Research On Classification Method Of High-dimensional Class-imbalanced Data Sets Base On SVM
8	Research On Multi-class Imbalanced Data Learning Algorithm Based On One-to-one Decomposition
9	Research On Imbalanced Data Classification Method Based On Random Forest Algorithm
10	Research On The Key Technologies And The Applications For The Class Of Imbalance Problem