Font Size: a A A

The Study And Improvement Of Stacking

Posted on:2019-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:H L XuFull Text:PDF
GTID:2428330566486426Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
The classification problem is one of the common problems in the field of data mining.With the continuous research and development of traditional data mining classification techniques(eg,Logistic Regression,Decision Trees,etc.),the performance of them is getting better and better in classification problems.However,this traditional algorithm is computationally easy to fall into over fitting.Integrated learning can effectively alleviate the over fitting problems encountered by traditional single classification algorithms.The Stacking algorithm is a special integration method that generates a meta-layer learner by combining the prediction results of different individual learners.When there is a lot of training data,Stacking algorithm is a strong integration method.The Stacking algorithm has at least two layers of learners,therefore the Stacking algorithm has a high computational cost.This paper proposes an improved Stacking algorithm based on the concept of class-probability output and multiple-response linear regression to decrease the computing time of Stacking algorithm and to solve the problem of less sample data.The specific work is as follows:(1)Present a three-layers Stacking algorithm structure.Individual classifiers of the first layer are trained by the original data set;The second layer is represented by a new input attribute to increase the second level of training data,while making the input attribute of individual classifier in the second layer not increase with the increase of the classifier;The second layer uses some individual classifiers to relearn the learning results of the previous layer to reduce the noise in the output probability of the individual classifiers in the first layer.Add a voting strategy at the prediction stage to get the sample' category.In order to intelligently select the individual classifier in the improved Stacking algorithm,the improved Stacking algorithm is optimized by a genetic algorithm,and the combination of individual classifier is optimized.(2)Compare the results of the improved algorithm with other algorithms on multiple UCI data sets and ORL image dataset in accuracy,precision,F1,and run time.The results show that the algorithm is superior to other integration methods in accuracy,precision and F1.Compare with the traditional Stacking algorithm which bases on probability distribution and multi-response linear regression,the running time is reduced in most of the data set.At the same time,the performance of the classifier optimized by the genetic algorithm is fairly equal to that of the classifier obtained by manual adjustment.This shows that the genetic algorithm can replace the manual adjustment in the optimization of the improved Stacking algorithm.
Keywords/Search Tags:ensemble learning, Stacking algorithm, multiple response linear regression, genetic algorithm
PDF Full Text Request
Related items