Font Size: a A A

The Theory And Applications Of Partial Least Squares Regression And Spare Partial Least Squares Regression

Posted on:2016-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:T T GuoFull Text:PDF
GTID:2180330470970806Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Nowadays, a wide range of high-dimensional complex data appears in various scientific fields, requiring statisticians to seek new methods of modeling. One potential difficulty handling high-dimensional data is how to solve the multicollinearity between predictor variables. Partial Least Squares (PLS) regression is the promotion of the traditional multiple linear regression. PLS regression which processed a strong correlation data is very suitable for statistical analysis. PLS uses information integration and screening technology in the modeling process to extract a number of new components which could explain most to the system from the original variable, and uses these new integrated variables modeling. It means that PLS is the combination of multiple linear regression, principal component analysis and canonical correlation analysis. Given stochastic simulation data and Yunnan power data, the PLS modeling principle, this paper launched a detailed study and discussion in the theory of modeling, the solution and algorithm for modeling, algorithm simulation, parameter adjustment, data analysis and other aspects of the PLS model, and compared the multiple linear regression and PLS models comprehensively with criteria such as cross-validation, mean square error. Data analysis showed that when a strong correlation between the presence of predictors, PLS has a higher superiority.Another focus of this paper is sparse partial least squares (SPLS) regression. As each new component of partial least squares is the linear combination of all original predictors.When the number of predictors is large, this will give a negative impact to explain model, neither conducive to finding the most important predictor. SPLS is the improvement from PLS which shrink estimated coefficients on the basis of the PLS, and those smaller coefficient (absolute sense) just shrink to zero, so that the corresponding variable can be removed from the model. This paper studies the SPLS algorithm and implementation, and use similar research PLS idea of multiple regression, PLS and SPLS model were compared with full regard and on Yunnanpower data to identify the impact of electricity consumption the most important factor.Data simulation results showed that:PLS regression and SPLS regression model can effectively solve the multicollinearity. In contrast, SPLS regression model fitting effect better, more accurate model predictions. Research on Yunnan electricity consumption factors showed:Yunnan electricity demand as the economic development of Yunnan, the growth of total retail sales and an increase in fixed asset investment is growing. Urbanization in Yunnan is also driving the demand for electricity of the whole society, and improve the consumer price index for electricity also have a positive role in boosting demand, but effects little. The results showed that compared to PLS regression, the prediction error of SPLS regression is smaller, predicted model more accurate.Finally, the summary and the outlook of the PLS regression and SPLS regression were proposed.
Keywords/Search Tags:Partial Least Squares regression model, Sparse Partial Least Squares regression model, Yunnan electricity demand, cross-validation
PDF Full Text Request
Related items