Font Size: a A A

Research On Optimization Algorithms Of Stacking Classifiers

Posted on:2017-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q QinFull Text:PDF
GTID:2428330569998547Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Stacking is a classical ensemble learning method.It can obtain high generalization ability by using Stacking classifiers.It has been widely used in various fields and it is also the winners' method in lots of data competitions.A key to obtain the stacking classifiers of the high generalization accuracy is to select a proper configuration according to the datasets.On the other hand,with the arrival of the era of Big Data,the scale of data is increasing,the demand of data mining with machine learning technology for large-scale datasets is growing.In the classification of the large-scale datasets,the rapid training of high generalization accuracy of the classifiers is the goal that people pursuing.In this paper,the configuration selection problem and distributed training problem of Stacking classifiers are studied,the main research work is as follows:To solve the configuration selection problem that the accuracy of the selected configuration is not high enough by using the stacking classifiers configuration selection method based on the genetic algorithm,we propose an advanced algorithm for stacking classifiers configuration selection,called AGA-E(Advanced Genetic Algorithm Ensemble).To obtain the stacking classifier of high accuracy,the algorithm balances the base and meta classifier selections through the subspace partitioning and reduces the chance of the reproduction of unnecessary individuals by a tabu strategy.Facing the dilemma that the exhaustive search reaches the optimal solution at a large time overhead while the heuristic search gets the local optimal solution with an affordable time cost,we propose a pruning-based selection algorithm,called PNEP-S(Positive and Negative Effects-based Pruned Stacking).In order to predict the accuracy of the configuration,this paper presents the concept of positive and negative effects based on observation.PNEP-S performs a large number of effective pruning based on the positive and negative effects,which can achieve a high accuracy stacking configuration at an affordable time cost.Aiming at the large time overhead problem caused by using standard stacking training method on the large scale training datasets,we propose a training method based on the distributed Stacking classifier,called StackingD.This method improves the speed of training by dividing the process of generating the meta layer training set and the training base classifiers into the nodes.At the same time,it can obtain the accuracy of a little bit loss by using weighted majority voting to combine the corresponding base classifiers of each node.
Keywords/Search Tags:Stacking, Classification, Configuration Selection, Distributed Machine Learning
PDF Full Text Request
Related items