The atmospheric environment field in China has formed a huge,multidimensional and heterogeneous atmospheric pollutant related concentration,composition,meteorological conditions,emission sources and other data systems.Based on these data systems,the accurate prediction,simulation and analysis of atmospheric pollutant concentration has been a difficulty.Considering the problems such as poor interpretability of existing machine learning models and insufficient ability to extract the time correlation between atmospheric pollutant and driving factors,this paper proposes an improved interpretable machine learning model to build an atmospheric pollution analysis system of "environmental data processing-pollution characteristics analysis-prediction-driving factors".In the case of Jincheng City in Shanxi Province,we studied the characteristics,prediction and driving factors of urban O3 pollution in summer and PM2 5 pollution in autumn and winter.The results show that this method can predict and simulate the change of air pollutant concentration and data mining,and the established air pollution analysis method can provide a basis for the accurate prediction of urban air quality and pollution prevention and control.(1)Based on Correlation-ML-SHAP multi-module coupling,we constructed an improved interpretable machine learning model.The Correlation module of environmental time series(Correlation)is used to analyze the temporal correlation between atmospheric pollutant and driving factors,and optimize the combination of model input.The prediction module of atmospheric pollutant concentrations(ML)is used to train and optimize the prediction model with great performance.The interpretability module(SHAP)is used to quantitatively analyze the driving factors and their interactions affecting the change of atmospheric pollutant concentrations.(2)We used the proposed method to predict O3 concentration in summer and PM2.5 concentration in autumn and winter.The results show that XGBoost model based on ensemble learning has the best prediction accuracy.The MAE,RMSE and R2 of this method for predicting O3 concentration in summer are 15.0,19.5 and 0.85,respectively.The MAE,RMSE and R2 of this method for predicting PM2.5 concentration in autumn and winter are 6.3,8.9 and 0.94,respectively.This method can be used to predict O3 concentration in summer and PM2.5 concentration in autumn and winter.(3)The proposed method can explore the driving factors affecting the change of O3 concentration in summer and PM2.5 concentration in autumn and winter.The results show that atmospheric pollutant concentrations are affected by various driving factors.The strong driving factors of O3 concentration in summer are temperature,solar radiation and precursor concentration,with the importance of O3 concentration increasing by 24.4%,18.1%and 22.8%on polluted days,respectively.When NO2 concentration exceeds 9 μg/m3 or CO concentration exceeds 0.7 mg/m3,the interaction of high temperature and low humidity has a probability of 93.4%and 87.2%to the increase of O3 concentration,respectively.The strong driving factors of PM2.5 pollution in autumn and winter are PM10,NO2 and humidity,with the importance of PM2.5 concentration increasing by 87.0%,77.7%and 4.2%respectively on polluted days.When the concentration of PM10 exceeds 90 μg/m3 or NO2 exceeds 10 μg/m3,the interaction of high humidity and low temperature has a probability of 96.7%and 86.0%to the increase of PM2.5 concentration,respectively. |