Font Size: a A A

Theoretical Recsearch On Boosting

Posted on:2015-11-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:W GaoFull Text:PDF
GTID:1318330518489326Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Boosting has been one of mainstream classification algorithms, and has attracted much attention in the international machine learning community for a long time. The research on Boosting and its relevance has brought about highly influential work; for example, the Godel Prize, which is one of the most prestigious awards in theoretical computer science, in the year of 2003, and the Longuet-Higgins Prize, which is one of the most prestigious awards in computer vision and pattern recognition, in the year of 2011, were awarded to the work on Boosting. This dissertation theoretically analyzes several problems on Boosting, and the main contributions are summarized as follows.(1) Theoretically present the margin-distribution generalization bounds for Boosting, and make an end of the debate on the margin explanation of Boosting for 15 years. Schapire et al. (1998) proposed the margin theory to understand the resistance of AdaBoost to overfitting, whereas Breiman (1999) cast serious doubts theoretically and empirically. Since then, margin explanation on Boosting has witnessed a lively debate. Based on new empirical Bernstein bounds, this dissertation presents the margin-distribution generalization bounds, which provide strong support to margin theory.(2) Theoretically propose the approximation stability, and prove that the gen-eralization ability of Boosting can be characterized by approximation stability.Stability is an effective tool to analyze the generalization of learning algorithms, whereas the previous L1 stability fails to study the generalization of Boosting. This dissertation proposes the approximation stability, which successfully characterizes the generaliza-tion of Boosting and provides stability analysis for other learning tasks.(3) Present a necessary and sufficient condition for multi-label consistency,and disclose the inconsistency of Boosting and develop consistent surrogate losses.Boosting is an effective approach for multi-label learning; however, theoretic under-standing on the consistency of multi-label learning is not clear. This dissertation presents the necessary and sufficient condition for the consistency of multi-label learning, and proves that Boosting is inconsistent with multi-label losses such as ranking loss and hamming loss. This dissertation also introduces consistent surrogate losses.(4) Present a sufficient condition and a necessary condition for AUC consis-tency, and prove that RankBoost is consistent with AUC, and is equivalent to Ad-aBoost under proper condition. AUC is an important performance measure widely used in diverse learning tasks. Theoretical study on AUC consistency is far from clear.This dissertation proves a necessary condition and a sufficient condition for AUC con-sistency. Based on such funding, RankBoost is proven to be consistent with AUC, and is equivalent to AdaBoost under proper condition.(5) Propose OPAUC algorithm based on AUC consistency, which scans the data only once and storage is independent to the size of training data. Because AUC is defined on pairs of positive and negative instances, previous AUC algorithms require to store the entire data and scan the data many times, which causes a challenge to deal with big data. Inspired by AUC consistency, this dissertation develops the OPAUC algorithm which scans the data only once and the storage is independent to the size of training data. We verify the effectiveness both empirically and empirically.
Keywords/Search Tags:Machine Learning, Learning Theory, Ensemble, Boosting, Generalization, Margin Theory, Stability, Multi-Label Learning, Online Learning, AUC, Consistency
PDF Full Text Request
Related items