Font Size: a A A

Research On Some Algorithms In Ensemble Learning

Posted on:2011-10-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:C X ZhangFull Text:PDF
GTID:1118330338489041Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Ensemble learning, a newly proposed machine learning paradigm, generally utilizes many learning machines to solve a prediction problem. Due to its potential to significantly improve the generalization capability of a learning system, research on the theories and algorithms of ensemble learning has been one of the hot points in the machine learning communities in the past two decades. Up to now, ensemble learning techniques have been successfully applied in many fields, such as phoneme recognition, gene expression data analyzing, remote sensing data processing, image processing, text classification and so on. However, ensemble learning technique is not very mature and there are still many problems which are deserved to be studied. This dissertation gives a relatively deep study of ensemble learning. First, the concept, structure, function and new progress of ensemble learning are introduced and the working mechanisms of two representative algorithm families (namely, Bagging and Boosting) are analyzed. Second, the main work we have done in this dissertation is summarized as follows:(1) Based on the AdaBoost algorithm, a local boosting algorithm for solving classifi-cation problems is proposed considering that the performance of a classifier at a specific data point is closely related to its performance on the neighbors of that data point. At the same time, a parameter is introduced into the new algorithm so that the algorithm can be more accurate than AdaBoost. The performance of the suggested algorithm is in-vestigated through numerical experiments. Meanwhile, the diversity-accuracy patterns of the ensemble classifiers constructed by the considered several ensemble techniques are studied by Kappa-Error diagrams to further investigate the reasons for the good performance of the proposed method.(2) Taking into account the special characteristic of the Bootstrap sampling method used in Bagging, a new ensemble classifier generation approach is suggested by combin-ing Bagging, Principal Component Analysis (PCA) and Random Subspace method. The experiments conducted by some benchmark real-world data sets show that the predic-tion accuracy of the proposed method is significantly better than Bagging and Random Subspace. Although AdaBoost has comparable performance with the new method, the latter has an advantage over the former with respect to the computational complexity.(3) A new ensemble classifier construction technique is developed by making use of the advantages of Bagging and Rotation Forest so that the built ensemble classifier has better prediction accuracy and is more robust to classification noise. The decomposition of error into bias and variance terms is utilized to study the ensemble classifier creation techniques further.(4) The ensemble classification method, Rotation Forest, is extended to solve regression problems and the effect of the choice of the parameters involved in it to the performance of the algorithm is investigated by conducting experiments with some simulated and synthetic data.(5) A selective ensemble learning algorithm is proposed by using the main idea of Boosting to determine the order in which base predictors are aggregated into a Double-Bagging ensemble. The performance and prediction speed of the original ensemble machines are hereby improved.(6) Multi-response linear regression (MLR) is an effective trainable combiner to fuse heterogeneous base classifiers. We employ learning curves to investigate the relative performance of MLR extensively (including the cases of different training set sizes and different strategies to use the given data set to train both base classifiers and combiner) and compare it with several other combination rules. The experimental results demon-strate that MLR performs better than the other combiners in the situations where the training set has small sample size.In the dissertation, we did a large number of experiments by some synthetic and real-world data sets. The obtained results indicate that the newly proposed algorithms per-form satisfactorily, which therefore provides us some feasible ways to solve the prediction problems which are encountered in practical applications.
Keywords/Search Tags:Ensemble learning, Base learning algorithm, Base classifier, Ensemble classi-fier, Kappa-Error diagram
PDF Full Text Request
Related items