Font Size: a A A

Algorithmic stability and ensemble-based learning

Posted on:2003-06-05Degree:Ph.DType:Dissertation
University:The University of ChicagoCandidate:Kutin, Samuel AlanFull Text:PDF
GTID:1468390011489179Subject:Computer Science
Abstract/Summary:
We explore two themes in formal learning theory. We begin with a detailed, general study of the relationship between the generalization error and stability of learning algorithms. We then examine ensemble-based learning from the points of view of stability, decorrelation, and threshold complexity.; A central problem of learning theory is bounding generalization error. Most such bounds have been obtained through uniform convergence, via some variant of VC dimension. These analyses focus on the space of classifiers available to a learning algorithm, rather than on the algorithm itself.; Bousquet and Elisseeff (2002) have shown, using McDiarmid's method of independent bounded differences (1989), that algorithmic stability implies good bounds on generalization error. However, their definition of stability is too restrictive to be widely applicable.; We introduce the more general notion of training stability. We show that training stability implies good bounds on generalization error even when the learner has infinite VC dimension. Our proof requires a result of independent interest: we generalize McDiarmid's theorem to the case when differences are bounded with high probability.; In the Probably-Approximately-Correct setting of Valiant (1984), training stability is necessary and sufficient for good error performance and serves as a distribution-dependent analog of VC dimension. This enables us to emphasize the learning algorithm instead of the classifier space.; We next consider algorithms (e.g., boosting) which construct ensembles of weak classifiers. We show that AdaBoost (Freund and Schapire, 1997), a widely-used boosting algorithm, is stability-preserving.; We demonstrate that some boosting algorithms implicitly construct classifiers which are decorrelated (i.e., rarely jointly wrong). We present two new algorithms which build ensembles of explicitly decorrelated classifier pairs. Experiments indicate that these new algorithms are sometimes competitive with AdaBoost and with bagging (Breiman, 1996). The experiments employ a fast new algorithm for finding decorrelated pairs of decision stumps.; Lastly, we use threshold complexity to show that, under certain conditions, AdaBoost takes a long time to converge. In some situations, our pair-based algorithms are exponentially faster than AdaBoost. We analyze boosting as a dynamical system over the space of sequences of weights.
Keywords/Search Tags:Algorithm, Stability, VC dimension, Generalization error, Boosting, Adaboost
Related items