| Spectral analysis has been widely used in medicine,chemical engineering,agricultural products and food due to its advantages of fast analysis speed,nondestructive,simple operation and available for on-line for analysis of complex samples including Petroleum,tobacco,medicine.However,there are many problems of band overlap,background and serious noise for spectral analysis of complex samples.Therefore,quantitative and qualitative analysis can be completed by using the method of chemometrics,in which the accurate prediction model of unknown samples is the key of spectral analysis.Nevertheless,with traditional multivariate calibration method,a quantitative model between the spectrum and the target value is established by a single modeling method,which results the forecasting result unsatisfied.The ensemble modeling method through a certain way to generate a plurality of subsets used for establishing many sub-models by basic modeling method and then sub-models results will eventually be integrated to get the final result.This method can obtain higher prediction accuracy and stability.The training subset generation,basic modeling method and sub-model integration are three main aspects in ensemble model modeling.Hence,thesis researched the three main aspects of ensemble model modeling and proposed three new modeling methods,which were applied to the quantitative analysis of complex samples.The contents of this thesis include:1.In the basic modeling aspect,extreme learning machine(ELM)as a modeling method was introduced and its feasibility in quantitative analysis of complex samples was verified.The traditional modeling methods are divided into linear and non-linear method.Linear method has the advantages of high efficiency and few parameters,but it has poor effect when it encounters non-linear problems.Non-linear methods have unique advantages in dealing with nonlinear problems,but these mothers have many disadvantages such as many parameters,time-consuming and easily fall into local optima and so on.Thus,taking the advantages of both linear and non-linear methods,ELM was introduced.The efficiency and stability of this method were investigated firstly.Then,the optimal activation function and number of hidden layer nodes are determined by a newly defined parameter.The predictive performance of ELM was compared with principal component regression(PCR),partial least squares(PLS),support vector regression(SVR)and back propagation artificial neural network(BP-ANN)by three spectral datasets.Results show that the efficiency of ELM is mainly affected by the number of nodes for a given dataset.Despite some instability,ELM becomes stable close to the optimal parameters.Moreover,ELM has a better or comparable performance compared with its competitors.This method has the advantages of high prediction accuracy and high speed.2.In the selection of training subset aspect,a novel boosting strategy by selecting sub-model from variable direction,named variable space boosting partial least squares(VS-BPLS)was proposed.In this method,PLS sub-models based on variable space are built by multiple cycles.At the first cycle,all the variables in the training set are given the same sampling weights and then a certain number of variables are selected to build PLS sub-model.In the following cycles,the greater sampling weights were given with larger training error.The final prediction is obtained by the weighted average of each prediction of all the sub-models.The performance of VS-BPLS is tested with two small sample spectral datasets.As comparisons to VSBPLS,the traditional PLS,Monte Carlo uninformative variable elimination PLS(MCUVE-PLS)and randomization test PLS(RT-PLS)have also been investigated.Results show that VS-BPLS has superiority in prediction accuracy and stability compared with other three methods.This method selected partial variables from variable direction to establish sub-model can not only solve the problem of small sample,but also delete the redundant information.3.In ensemble model modeling,the unfolded strategy was introduced and the empirical mode decomposition(EMD)was introduced to generate multiple sub-models from frequency direction.A novel regression model named as high and low frequency unfolded PLSR(HLUPLSR)was proposed..In the proposed method,the original signals are firstly decomposed into a finite number of intrinsic mode functions(IMFs)and a residue by EMD.Secondly,the former high frequency IMFs are summed as a high frequency matrix and the latter IMFs and residue are summed as a low frequency matrix.Finally,the two matrices are unfolded to an extended matrix in variable dimension,and then the PLS model was built between the extended matrix and the target values.The method has been applied to determine hydrocarbon contents of light gas oil and diesel fuels samples.Comparing with single PLS,first derivative-PLS(1st-PLS)and continuous wavelet transform-PLS(CWT-PLS),the HLUPLSR method shows superiority in predicting accuracy.HLUPLSR combines the advantages of EMD,unfolded strategy and PLS This method not only makes full use of the local feature information of the spectral signal,but also avoids the selection of the weights of the sub-models. |