Online ensemble learning

Posted on:2002-04-03

Degree:Ph.D

Type:Thesis

University:University of California, Berkeley

Candidate:Oza, Nikunj Chandrakant

Full Text:PDF

GTID:2468390011498989

Subject:Computer Science

Abstract/Summary:

This thesis presents online versions of the popular bagging and boosting algorithms. We demonstrate theoretically and experimentally that the online versions perform comparably to their original batch counterparts in terms of classification performance. However, our online algorithms yield the typical practical benefits of online learning algorithms when the amount of training data available is large.; Ensemble learning algorithms have become extremely popular over the last several years because these algorithms, which generate multiple base models using traditional machine learning algorithms and combine them into an ensemble model, have often demonstrated significantly better performance than single models. Bagging and boosting are two of the most popular algorithms because of their good empirical results and theoretical support. However, most ensemble algorithms operate in batch mode, i.e., they repeatedly read and process the entire training set. Typically, they require at least one pass through the training set for every base model to be included in the ensemble. The base model learning algorithms themselves may require several passes through the training set to create each base model. In situations where data is being generated continuously, storing data for batch learning is impractical, which makes using these ensemble learning algorithms impossible. These algorithms are also impractical in situations where the training set is large enough that reading and processing it many times would be prohibitively expensive.; This thesis describes online versions of bagging and boosting. Unlike the batch versions, our online versions require only one pass through the training examples in order regardless of the number of base models to be combined. We discuss how we derive the online algorithms from their batch counterparts as well as theoretical and experimental evidence that our online algorithms perform comparably to the batch versions in terms of classification performance. We also demonstrate that our online algorithms have the practical advantage of lower running time, especially for larger datasets. This makes our online algorithms practical for machine learning and data mining tasks where the amount of training data available is very large.

Keywords/Search Tags:

Online, Algorithms, Ensemble, Training, Bagging and boosting, Data

Related items

1	Research On Classifier Ensemble
2	Research On Ensemble Technique For Multiple Classifiers
3	Research On Classiifer Ensemble Based On Decision Tree
4	Research On Ensemble Learning Algorithm
5	Research On The Explanation For The Effectiveness Of Classical Ensemble Learning Algorithms And The Improvement Of Their Performance
6	Design And Implementation Of Online Learning Resource Recommendation System Based On IEUS-Bagging
7	Project Analysis For Face Recognition Based On Ensemble Learning
8	Research And Application Of Integrated Algorithms For Unbalanced Data Sets
9	Face Feature Extraction And Recognition Based On Ensemble Learning
10	Research Of Boosting Classificaion Algorithm For Imbalanced Data