Research On Optimization Algorithms Of Stacking Classifiers

Posted on:2017-03-21

Degree:Master

Type:Thesis

Country:China

Candidate:Z Q Qin

Full Text:PDF

GTID:2428330569998547

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Stacking is a classical ensemble learning method.It can obtain high generalization ability by using Stacking classifiers.It has been widely used in various fields and it is also the winners' method in lots of data competitions.A key to obtain the stacking classifiers of the high generalization accuracy is to select a proper configuration according to the datasets.On the other hand,with the arrival of the era of Big Data,the scale of data is increasing,the demand of data mining with machine learning technology for large-scale datasets is growing.In the classification of the large-scale datasets,the rapid training of high generalization accuracy of the classifiers is the goal that people pursuing.In this paper,the configuration selection problem and distributed training problem of Stacking classifiers are studied,the main research work is as follows:To solve the configuration selection problem that the accuracy of the selected configuration is not high enough by using the stacking classifiers configuration selection method based on the genetic algorithm,we propose an advanced algorithm for stacking classifiers configuration selection,called AGA-E(Advanced Genetic Algorithm Ensemble).To obtain the stacking classifier of high accuracy,the algorithm balances the base and meta classifier selections through the subspace partitioning and reduces the chance of the reproduction of unnecessary individuals by a tabu strategy.Facing the dilemma that the exhaustive search reaches the optimal solution at a large time overhead while the heuristic search gets the local optimal solution with an affordable time cost,we propose a pruning-based selection algorithm,called PNEP-S(Positive and Negative Effects-based Pruned Stacking).In order to predict the accuracy of the configuration,this paper presents the concept of positive and negative effects based on observation.PNEP-S performs a large number of effective pruning based on the positive and negative effects,which can achieve a high accuracy stacking configuration at an affordable time cost.Aiming at the large time overhead problem caused by using standard stacking training method on the large scale training datasets,we propose a training method based on the distributed Stacking classifier,called StackingD.This method improves the speed of training by dividing the process of generating the meta layer training set and the training base classifiers into the nodes.At the same time,it can obtain the accuracy of a little bit loss by using weighted majority voting to combine the corresponding base classifiers of each node.

Keywords/Search Tags:

Stacking, Classification, Configuration Selection, Distributed Machine Learning

PDF Full Text Request

Related items

1	Text Sentiment Classification Based On Stacking Combination
2	Research And Application Of Log Classification Based On Machine Learning
3	Research On Fraud Identification Of Vehicle Insurance Based On Machine Learning
4	Application Research Of Stacking Algorithm In Ordered Data And Text Classification
5	Stock Price Prediction Research Based On Feature Selection And Improved Stacking Algorithm
6	Data Center Network Resource Configuration And Transmission Optimization For Distributed Machine Learning
7	Research Of Automatic Text Classification Method Based On Machine Learning
8	Research And Application Of Stacking Algorithm Based On Multiple Meta Models
9	Distributed Machine Learning With Adaptive Sample Selection
10	Study And Implementation Of Network Traffic Classification Base On Machine Learning