The Study And Improvement Of Stacking

Posted on:2019-10-09

Degree:Master

Type:Thesis

Country:China

Candidate:H L Xu

Full Text:PDF

GTID:2428330566486426

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

The classification problem is one of the common problems in the field of data mining.With the continuous research and development of traditional data mining classification techniques(eg,Logistic Regression,Decision Trees,etc.),the performance of them is getting better and better in classification problems.However,this traditional algorithm is computationally easy to fall into over fitting.Integrated learning can effectively alleviate the over fitting problems encountered by traditional single classification algorithms.The Stacking algorithm is a special integration method that generates a meta-layer learner by combining the prediction results of different individual learners.When there is a lot of training data,Stacking algorithm is a strong integration method.The Stacking algorithm has at least two layers of learners,therefore the Stacking algorithm has a high computational cost.This paper proposes an improved Stacking algorithm based on the concept of class-probability output and multiple-response linear regression to decrease the computing time of Stacking algorithm and to solve the problem of less sample data.The specific work is as follows:(1)Present a three-layers Stacking algorithm structure.Individual classifiers of the first layer are trained by the original data set;The second layer is represented by a new input attribute to increase the second level of training data,while making the input attribute of individual classifier in the second layer not increase with the increase of the classifier;The second layer uses some individual classifiers to relearn the learning results of the previous layer to reduce the noise in the output probability of the individual classifiers in the first layer.Add a voting strategy at the prediction stage to get the sample' category.In order to intelligently select the individual classifier in the improved Stacking algorithm,the improved Stacking algorithm is optimized by a genetic algorithm,and the combination of individual classifier is optimized.(2)Compare the results of the improved algorithm with other algorithms on multiple UCI data sets and ORL image dataset in accuracy,precision,F1,and run time.The results show that the algorithm is superior to other integration methods in accuracy,precision and F1.Compare with the traditional Stacking algorithm which bases on probability distribution and multi-response linear regression,the running time is reduced in most of the data set.At the same time,the performance of the classifier optimized by the genetic algorithm is fairly equal to that of the classifier obtained by manual adjustment.This shows that the genetic algorithm can replace the manual adjustment in the optimization of the improved Stacking algorithm.

Keywords/Search Tags:

ensemble learning, Stacking algorithm, multiple response linear regression, genetic algorithm

PDF Full Text Request

Related items

1	Research And Application Of Stacking Algorithm Based On Multiple Meta Models
2	Improvement Of Genetic Algorithm With Surrogate Model
3	Design And Implementation Of Intrusion Detection Algorithm Based On Machine Learning
4	Study On The Frequency Of Auto Insurance Claims Based On Ensemble Learning Algorithm
5	The Research On Ensemble Incremental Learning Classification Algorithm
6	Research On Ensemble Regression Learning Based On Classifier Selection And Multiple Kernel Selection Under Least Squares Framework
7	Study On The Correction Model Of Non-covalent Interaction Based On Ensemble Learning
8	Research On AdaBoost Regression Tree-based Multi-target Prediction Algorithm
9	Research On Key Technologies Of Relevance Vector Regression Metamodeling And Their Application
10	Research On Stacking Classification Model Based On Adaptive Tuning