Font Size: a A A

Constructive and destructive optimization methods for predictive ensemble learning

Posted on:2007-02-21Degree:Ph.DType:Dissertation
University:The University of IowaCandidate:Zhang, YiFull Text:PDF
GTID:1458390005989278Subject:Business Administration
Abstract/Summary:
An ensemble is a group of classifiers that jointly solve a classification problem. Both theoretical and empirical findings support that an ensemble will often outperform a single classifier. In addition, the ensemble learning framework provides tools to tackle complicated or large problems that were once infeasible for traditional algorithms. The key issue of ensemble learning is how to create a relatively small-sized ensemble with a good bias and variance trade-off. Currently, there are two general approaches to achieve that goal: constructive building, e.g. bagging and boosting, and destructive pruning, e.g. Kappa pruning. Two new ensemble algorithms, bagging with adaptive cost (bacing) and semi-definite-programming (SDP)-based pruning, are proposed in this dissertation. By iteratively changing the weights of the training points, bacing is able to focus on the boundary points that the current ensemble barely predicts right or wrong so that the margins on these "difficult" points can be maximized. SDP-based pruning, on the other hand, attempts to pick a subset of classifiers from an existing ensemble to improve the effectiveness and the efficiency of the ensemble. Along with standard computational experiments that compare these two algorithms with other peer algorithms on a variety of datasets, several new problems are used to demonstrate the capabilities of the new algorithms. Specifically, bacing is applied to point-wise lift curve optimization. SDP-based pruning is evaluated in the classifier sharing scenario, where classifiers from different but related problem domains are pooled, and a subset of them are selected for each domain. The effect of classifier sharing under the condition of data scarcity is also studied. Our results indicate that bacing outperforms bagging and boosting, and a variation of bacing is able to marginally improve targeted points on a lift curve. SDP-based pruning is also ahead of its peer algorithms in terms of accuracy. Its application in the classifier sharing scenario substantially improves the effectiveness and robustness of the selected ensembles.
Keywords/Search Tags:Ensemble, Classifier
Related items