| Ozone(O3)is an important trace gas in the earth’s atmosphere,and excessive concentrations of O3 near the ground will endanger the health of humans,animals and plants.The monitoring methods of ground-level O3 mainly include site monitoring and satellite remote sensing monitoring,the number of monitoring stations is limited,satellite remote sensing can achieve large-scale monitoring,limited by the interference of stratospheric O3,it is difficult to accurately obtain tropospheric O3 concentration information.Satellite-observed O3 precursors combined with machine learning algorithms are widely used to estimate ground-level O3 concentrations.Based on the O3 photochemical reaction,TROPOMI NO2 and HCHO are often used as the main feature factors of machine learning O3 estimation models,but CO,which is also involved in the photochemical reaction,also has a positive effect on O3,which is indeed ignored.In addition,machine learning models are black-boxed,making it difficult to understand the relationship between input variables and output results.Therefore,based on TROPOMI CO,NO2 and HCHO column concentration products and other auxiliary data,this thesis uses machine learning algorithms to establish a ground-level O3 concentration estimation model,uses SHAP interpretation method to explain the O3 estimation model in three different periods(whole year,warm season and cold season),analyzes the contribution and difference of the characteristic variables of the sample to the predicted O3 value in different time periods through the SHAP value,identifies the importance of features,and improves the interpretability of the model.In the warm season model,emission reduction estimates were made for CO,NO2 and HCHO.In addition,due to the large lack of data of TROPOMI products due to the limitations of cloud and inversion algorithms,the coverage of O3 estimation results based on this is low,so three schemes are used to improve coverage.The main conclusions are as follows:(1)The model performance of RF and XGBoost algorithms was compared,and the XGBoost model performed better,and its ten-fold cross-validation R2 and RMSE are 0.86 and 15.95μg/m3,respectively.At the same time,the effects of CO,NO2 and HCHO on model accuracy were compared,and the addition of CO was more conducive to the improvement of model accuracy.(2)Temperature is very important in different time period models,and temperature is the main influencing factor of O3 in the cold season.In the warm season,CO is the main influencing factor of O3,followed by temperature,then NO2,while HCHO has a weaker effect.In the warm season model,comparing the relative contributions of CO,NO2 and HCHO to the predicted O3 value in geographic space,it is found that the positive contribution of CO is the main in the North China Plain and surrounding areas,and the positive contribution of NO2 is mainly in Chengdu-Chongqing and Yangtze River Delta.In the warm season model,after reducing CO,NO2 and HCHO by 10%,the model predicted that the mean O3 decreased by 6.0,1.3 and 0.5μg/m3,respectively,and the O3 decreased the most after reducing CO.(3)In contrast,the missing data of TROPOMI is reconstructed based on the DINEOF method and then participated in modeling(Scheme 1),the missing data is nulled to train the XGBoost model(Scheme 2),and the modeling results with and without satellite observation data are combined(Scheme 3)to improve the coverage of O3 estimation results.The result of scheme 1 is the best,the model accuracy is basically the same as scheme 2 and higher than scheme 3,the model accuracy in the reconstruction data part is the highest,and when O3 heavy pollution occurs in the reconstruction area,the phenomenon of high underestimation of the model can be significantly improved,and the spatial distribution of the results is more reasonable.After the reconstruction,the average coverage of ground-level O3 concentration was estimated by the model,which increased from 33.6%to 97.2%.Adding reconstructed tropomi data to participate in modeling can improve both result coverage and model performance.This thesis has 31 figures,6 tables,and 100 references. |