Font Size: a A A

Research On Air Pollutant Concentration Prediction Model Based On Ensemble Learning And Interpretability Method

Posted on:2022-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y L JiaFull Text:PDF
GTID:2491306491484444Subject:computer science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,air pollutant concentration forecasting has received widespread attention.Establishing a predictive model to improve the accuracy and stability of the model has become the focus of the forecasting work.People often solve the above problems by establishing an ensemble learning model.And the choice of basic learners and ensemble strategy in the ensemble learning model will directly affect the predictive performance of the ensemble learning model.At the same time,most of the ensemble prediction models are black-box models.These models cannot prove the credibility of the prediction results and make the prediction results controversial.In response to the above problems,we propose an improved ensemble learning method S-MStacking and establish an air pollutant concentration prediction model based on this method.After establishing model,we use interpretability methods to interpret and analyze the prediction results of the model.We take the hourly data of air pollutant concentration and historical meteorological data in Lanzhou City as the research object,and the current air pollutant concentration as the forecast target.The specific work mainly includes:(1)Data analysis and pre-processing before establishing model.Before establishing model,analysis the basic characteristics of the data and the correlation between data characteristics,and display the analysis results in a visual manner;then perform data pre-processing and feature engineering,and In the selection stage,a hybrid integrated feature selection method is proposed to rank the importance of features;(2)Propose an improved ensemble learning method and model construction.We propose an improved ensemble learning method S-MStacking.Firstly,cut out similar basic learners through use K-Means clustering algorithm.Secondly,in order to ensure the best prediction performance of the ensemble learning model,use a multi-objective optimization algorithm MOBA to select the basic learner.Finally,we propose an improved ensemble strategy MStacking to combine the basic learner to further improve the predictive performance of the ensemble model.Through comparison and analysis with different single models and different integrated models,it can be seen that the air pollutant concentration prediction model based on the S-MStacking ensemble learning method proposed in this paper has obvious advantages in terms of prediction accuracy and stability;(3)Model interpretability analysis.Based on three model-agnostic methods(Feature Importance,Accumulated Local Effects Plot,and SHAP),we analyze the predictive results of the model from the perspective of features,and the analysis results show that the prediction results of our proposed model are credible.
Keywords/Search Tags:Air pollutant concentration prediction, K-Means, Multi-object optimization, S-MStacking, Model-agnostic methods
PDF Full Text Request
Related items