Font Size: a A A

The Research Of SLT-PLS Regression Algorithm Based On Ensemble Learning

Posted on:2014-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:G D FanFull Text:PDF
GTID:2348330473451129Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The analysis of complex samples is a great challenge in analytical chemistry and industry. Near infrared spectroscopy has the advantage of simple preparation, fast analysis and nondestructive etc. It has been widely used in medicine, food, petrochemicals, agriculture and other fields. But the near-infrared spectra are broad and overlapping seriously. It must be by multivariate calibration method to obtain quantitative information of a component, such as multiple linear regression (MLR), principal component regression (PCR), partial least squares (PLS), etc. The primary objective of PRC is to extract the hidden information in the matrix X, than predict the values of Y variables. This approach would ensure that we use only those independent variables, and the noise will be eliminated, so as to achieve the purpose of improving the quality of predictive models. However, the principal component regression still has some shortcomings. When the correlation of some useful variables is very small, we is very easy to miss them during selecting the main ingredient, so that the reliability of the result of prediction model declines and if we select each components carefully, it is too difficult. PLS can solve this problem. It uses the method that decompose both X and Y variables and extracts components (often referred to as factor) from both X and Y variables, then arrange factors according to the correlation between them in descending order. But with directions in X-space having large variance unrelated to Y, the PLS model may not work well. The Y-irrelevant regularities, such as the presence of baseline shifts and unimportant regions in spectrum data, make complicated PLS model interpretation. In some cases, it may lead to prediction errors that are unnecessarily large. For this defect, Bi Yiming ect proposed SLT-PLS algorithm so that the defect can be solved. But once the time when the sample is unevenly distributed, STL-PLS can not predict very well and not be able to achieve the desired prediction.To address this shortcoming, this article will introduce ensemble learning algorithm. Ensemble learning has gradually become the first of the four machine learning research since 1990s. It can effectively improve the generalization ability of machine learning algorithms and overall performance and has been successfully used in web information retrieval and filtering, data mining analysis and other fields.This paper presents a new modular algorithm used for quantitative analysis of the infrared spectrum. The algorithm combines bagging and SLT-PLS, called bagging-sltpls (SPLS). Firstly using the random sampling method of bagging, we can use fewer samples to build model so that not only save time but also increase the number of models. When creating each sub-model, it uses the SLT-PLS algorithm, which can resolve the problem that when there is a lot of variables that are unrelated with Y in the X space, PLS can not predict a good result. Finally, combine every models with the rules of the average weight and cross-validation weight with selective ensemble. Weighting rules can exclude models with small contribution and retain models with large contribution, thus solving the problem that when samples are uneven distribution, STL-PLS can not predict very well and not be able to achieve the desired outcome of the prediction. In BSPLS algorithm experiments of this paper, we analyzed the two weight distribution rule. Finally, the result of NIR experiments of four public data sets proved the proposed BSPLS algorithm can provide more advanced forecast and outperformed the conventional PLS algorithm, SLT-PLS and several ensemble based PLS algorithms including bagging PLS and stacking PLS.
Keywords/Search Tags:multivariate calibration, partial least squares, ensemble learning, selective weight rule
PDF Full Text Request
Related items