| Ensemble methods can often improve the performance of a single classifier,but their drawbacks are obvious that the final models require large storage space and the forecast takes long time and so on.As a result,a variety of pruning techniques have been proposed.The goal of the pruning method is not only to effectively reduce the ensemble size,but also to maintain or even enhance the predictive performance of the original ensemble method.Ensemble pruning technology has become an important subject in the field of machine learning.Pruning the Bagging,as one of the classic ensemble methods,aiming at improving the prediction performance and reducing the ensemble size effectively,has been also widely focused on.However,currently existing methods for pruning Bagging usually require complex calculations,which greatly increases the computational cost of the pruning model.In this paper,two independent Bagging pruning methods are proposed,one called accuracy-based pruning method and the other called distance-based pruning method.Furthermore,we presented a two-stage hybrid pruning technique with smaller ensemble size and higher prediction quality.The main contents of this paper include:(1)Study Bagging pruning method based on accuracy.This method uses the prediction accuracy values of the Bagging submodels on the out-of-bag sample sets,and selects the submodels with relatively high-quality performance according to the quantile threshold to construct a new ensemble model.This pruning method not only considers the predictive accuracy of the submodel,but also reduces the number of submodels in the original Bagging ensemble.(2)Study Bagging pruning method based on the distance.It firstly calculates the sample center for each out-of-bag sample set in each sub-model for the classic Bagging,and then calculates the distance between a new unknown sample and the centers of all out-of-bag sample sets.Secondly,part of models which are distant from the new sample are filtered out,and a new ensemble model is then established according to the quantile threshold.This pruning technique takes into account the individual differences of the samples to be predicted,and the selected submodel sets are different for the different new samples.This can not only reduce the number of Bagging submodels associated the new sample,but also can accelerate the forecast task and improve the prediction performance.(3)Based on the above two independent pruning techniques,we proposed a two-stage hybrid pruning method.We refer to the accuracy based pruning method as P1,and the distance based pruning method as P2.According to the combination order,the two-stage hybrid pruning method includes two kinds of pruning methods,namely P1 + P2 and P2 + P1.P1 +P2 method is to first select the submodel based on the accuracy of the pruning method P1,and then use the distance-based pruning technique to filter;P2 + P1 method is the opposite.It first uses P2 pruning,and then applies P1 to reduce the ensemble furthermore.Two-stage hybrid pruning method combines the advantages of the two independent pruning techniques,which can further reduce the number of submodels and improve the model predictive performance.Finally,we used 28 data sets from UCI to perform a 5-fold cross validation for the proposed pruning methods.In our computational experiments,four kinds of base classifiers are used and compared in all ensemble models.The base classifiers include decision tree,Gaussian naive Bayesian,K-Nearest Neighbor and Logistic Regression.The experimental results show that the pruning of the traditional Bagging algorithm can not only solve the problem of large memory occupied by ensemble model,but also can further improve the prediction accuracy.In the majority of cases,the two-stage hybrid pruning method proposed in this paper performed better than the existing several similar optimization methods. |