Font Size: a A A

Application Of Partial Least Squares Regression Based On Stacking In Population Size Analysis

Posted on:2022-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2480306722464174Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the advent of 5G era and the development of data storage technology,more and more data are generated and recorded in human life,the data has various forms and involves a wide range of fields.In the case of a variety of large data emerge in endlessly,the analysis of high-dimensional data with more features than samples has become more and more important.Partial least squares regression(PLS)has the advantage of processing highdimensional data(the number of features is greater than the number of samples),but its fitting effect on nonlinear relationship is poor.To solve this problem,kernel partial least squares regression(KPLS)came into being,but the computational complexity of KPLS model is high.In order to realize the nonlinear fitting and reduce the computational complexity,this paper embeds stacking ensemble into the PLS model,constructs stacking-plsr model and empirically tests the performance of the stacking-plsr model.This essay finally comes to the following conclusions:1)The fitting effect of the improved stacking-plsr model is better than that of PLS model for nonlinear high-dimensional data.At the same time,the computational complexity of stacking-plsr model is lower than that of KPLS model,and the equivalence between stacking-plsr model and polynomial KPLS model is discussed theoretically.In order to avoid the influence of model over fitting,the sample data are divided into training set and test set.The fitting effect of stacking-plsre model is significantly improved compared with traditional PLS model,the MSE of the test set is reduced by 68.26%,and the ARE of the test set is reduced by 34.44%.2)By changing the train data set and test data set,the robustness of stacking-plsr model is tested.The results show that the robustness of stacking-plsr model is good,and the fluctuation of MSE and ARE of model prediction results are small,even if the train data set used for training model has a small sample size or the train data set contains extreme data,stacking pslr model can effectively fit the data and predict.3)The sensitivity of stacking-plsr model is studied by fixing the number of principal components extracted from the model and changing the value of super parameter degree.Kruskal-Wallis test was used to analyze the results of different random trials.It was found that the sensitivity of stacking-plsr was lower when the value of super parameter was greater than or equal to 3,this conclusion also means that stacking-plsr model was not sensitive to the value of super parameter degree.At the same time,it is found that when the super parameter degree is 3,the fitting effect of stacking-plsr model is improved compared with that when the super parameter degree is 2,which indicates that it is appropriate to improve the traditional PLS model by using the integration idea of stacking.4)Because the fitting effect of the model is affected by the number of extracted principal components and the super parameter degree at the same time,and there is mutual influence between the two parameters.In order to select the most suitable combination of super parameters.Optimization of super parameter combination by grid search algorithm,it is found that stacking-plsr model tends to choose the phase with larger value of super parameter degree and the small number of principal components when fitting data by grid search.
Keywords/Search Tags:stacking ensemble, stacking-plsr model, grid search, Kruskal-Wallis test
PDF Full Text Request
Related items