| Total organic carbon(TOC)content and oil content are important indicators to evaluate the quality of oil shale.The detection process of oil shale based on physicochemical experiment is complex and cumbersome,and has high technical requirements for inspectors,which can not meet the requirements of rapid detection in the production process.Near infrared spectroscopy has the advantages of fast detection speed,no need to destroy samples,no need for chemical reagents,simple operation and so on.This paper takes 230 core samples collected from a block in Songliao Basin as the research object.The detection model of total organic carbon content and oil content of oil shale based on near infrared spectroscopy is established,and the influence of the first mock exam and ensemble learning model on the prediction accuracy of oil shale content is discussed.The research work is as follows.(1)The detection model of total organic carbon content in oil shale was established.The Monte Carlo method is used to eliminate abnormal samples,and the detrend combined with the baseline correction method is used for preprocessing for spectrum.The succesive projection algorithm(SPA),uninformative variable elimination(UVE)algorithm and competitive adaptive reweighting(CARS)algorithm are adopted to select the characteristic wavelength respectively.Partial least squares(PLS)model,support vector machine(SVM)model and random forest(RF)model are established to predict the total organic carbon content of oil shale.The results show that the performance of the nonlinear RF and SVM model is superior to that of the linear PLS model,This is because the carbon in oil shale samples exists in all kinds of hydrocarbons,whose absorption peaks affect each other,resulting in a complex nonlinear relationship between the total organic carbon content of oil shale and the near-infrared spectrum data.Among the three models,CARS-SVM model show a better performance,with a result of 0.9066 and 0.2220 for R_p~2 and RMSEP respectively.It can be seen that the application of near infrared spectroscopy in the rapid detection of TOC content in oil shale is feasible,and the CARS-SVM model can show a good detection effect.(2)The oil content detection ensemble learning model of oil shale was established.Aiming to overcome the shortcomings that the prediction accuracy of a single model is hard to improve,a heterogeneous ensemble learning model based on the Stacking framework,combined with near-infrared spectroscopy analysis technology,was adopted to detect the oil content in oil shale in this study.And data set after removing abnormal samples were randomly divided into a training set and test set according to the ratio of 3:1.The detrend coupled with the baseline correction method was used to eliminate the influence of noise and baseline drift in spectral data.After that,the random forest algorithm(RF)was used to extract the characteristic wavelengths according to their importance.To further reduce the data dimension,characteristic wavelengths extracted by the CARS algorithm.Finally,a stacking ensemble learning model with PLS,SVM,RF and GBDT as primary learners and PLS model as secondary learners is constructed.The results show that the RF-CARS method can effectively screen important wavelengths.Compared with single(SVM,PLS)and the isomorphic ensemble learning model(RF,GBDT),the stacking heterogeneous ensemble learning model can significantly improve the generalization and prediction ability.The coefficient of R~2 and RMSEP of the stacking integration learning model is0.9174,and is 0.6601,respectively.The heterogeneous ensemble learning model based on stacking can combine the advantages of primary learners,and then improve the performance of oil content detection model. |