Accurate prediction of air quality index plays an important role in controlling air pollution and improving air quality.However,AQI has strong volatility and uncertainty,and the influencing factors are complex.Most of the existing studies considered the influence of meteorological conditions on AQI,but ignored the influence of key pollution sources on air quality and the spatial correlation of AQI between stations.In this paper,exhaust emissions of key pollution sources were included into the impact index system for the first time,and other inputs of meteorological information and spatial correlation were also introduced to fully consider the influence of multiple indicators and regions on AQI prediction.In order to improve the prediction performance and efficiency of the model,this paper proposed the DTW-SARIMA-CNN-LSTM air quality prediction model.Taking the air quality data,meteorological data and emission data of key pollution sources of Taiyuan,Datong,Xinzhou and Shuozhou as the research object,AQI of Taiyuan was predicted,and the data of other three cities were used to further verify the conclusion.Firstly,feature selection was carried out using DTW,which was a method to describe the similarity of time series according to the trend of increasing and decreasing both sequences,and the index features that have important influence on AQI were selected from the perspective of time series similarity.Secondly,this paper constructed the SARIMA-CNN-LSTM combination model from two aspects of data decomposition and result integration.The time series was decomposed.The SARIMA model was used to conduct single index modeling and extract the linear information contained in AQI sequence,such as cycle and trend.Information that could not be explained by SARIMA was classified as residual item output;Then,the CNN-LSTM model was constructed for the second prediction of the residual of AQI,and the CNN-LSTM model with several related indexes such as PM2.5 as input was constructed to further study the residual of SARIMA model.The input layer not only contained the residual of SARIMA model,but also included covariables that had an important influence on the change of AQI,as well as the spatial position relationship between stations.The feature extraction of nonlinear relationship between residuals and multiple variables was realized,and the AQI of Taiyuan City was accurately predicted from the two dimensions of time and space.The experimental results showed that,the RMSE and MAE of the SARIMA-CNN-LSTM model,which took meteorological factors and exhaust emissions as the influencing indicators,were reduced by 37.61%and 34.59%compared with the general CNN-LSTM model,and the prediction results were more accurate.Meanwhile,At the same time,the important features selected by DTW method had smaller experimental error than those selected by random forest method.The RMSE and MAE of RF-SARIMA-CNN-LSTM model were reduced to 23.3072 and 15.8961,and the RMSE and MAE of DTW-SARIMA-CNN-LSTM model were reduced to 18.7949 and 12.9218.Finally,the DTW-SARIMA-CNN-LSTM model was used to predict the AQI of Datong,Shuozhou and Xinzhou respectively,to further verify the validity of the DTW-SARIMA-CNN-LSTM model. |