Suspended matter in the atmosphere can be classified into many types based on their particle size.Atmospheric particulate matter with an aerodynamic diameter of less than2.5μm is commonly referred to as fine particulate matter(PM2.5).PM2.5 not only causes respiratory diseases in humans,but also indirectly affects the human immune system,leading to cardiovascular and cerebrovascular diseases and even cancer.With the accelerated pace of national development,environmental issues are coming to the fore.Air pollution remains a major problem in China,and China has one of the highest PM2.5levels in the world,with the five provinces of Beijing,Tianjin,Hebei,Shandong and Henan being the most polluted.At present,PM2.5 mass concentration mainly relies on air quality stations to obtain,and in the vast areas with complex topography,air quality monitoring stations are scarce and unevenly distributed,so it is difficult to obtain PM2.5 concentration data with continuous spatial distribution by interpolation or extrapolation of air quality monitoring stations.With the continuous development of satellite remote sensing technology,spatially continuous PM2.5 data can be generated with the help of satellite remote sensing data,but the existing remote sensing data aerosol optical depth(AOD)has a large missing area affected by clouds,snow and other factors,and how to fill the missing area of AOD is a problem worth considering.Machine learning methods are more accurate in PM2.5concentration simulation,but many current studies mostly use a single machine learning model or several machine learning models compared in a certain region for PM2.5concentration simulation,which is usually done by a combination of intuition and trial-and-error in terms of hyperparameter optimization and optimal model selection,and the process lacks rigor.Therefore,this paper introduces an automatic machine learning method to compare multiple machine learning and selects the MODIS multi-angle aerosol optical depth product MCD19A2(MODIS AOD)with high spatial and temporal resolution to simulate PM2.5.Since MODIS AOD is seriously missing spatially due to the influence of clouds and snow,the AOD data from the reanalysis dataset MERRA2 is chosen to fill in the missing areas of MODIS AOD.This paper also selects meteorological data,population density data,elevation,and PM2.5 site real measurements data,etc.,and uses automatic machine learning models to simulate the model accuracy results of various machine learning algorithms using the five provinces of Beijing,Tianjin,Hebei,Shandong and Henan as the study area,and selects the machine learning algorithm with the highest simulation accuracy to simulate the daily PM2.5 concentration data of the study area,and analyzes the spatial and temporal The daily PM2.5 concentrations in the study area were simulated and the spatial and temporal characteristics of the study area were analyzed.The main conclusions drawn from the study of this paper are as follows:(1)MODIS AOD resolution is high,but due to the influence of clouds,snow and other factors,there is a large spatially missing area,using MERRA2 aerosol optical thickness data to fill in the missing area of MODIS data,so that we can get the full coverage of AOD data for the whole study area.(2)The correlation analysis of variables affecting PM2.5 concentration revealed that surface temperature,elevation,boundary layer height,solar radiation and vegetation index showed significant negative correlations,while aerosol optical depth,population density and nighttime lighting showed significant positive correlations.The extreme stochastic tree model was interpreted using the SHAP model interpretation method,and it was found that aerosol optical depth and surface temperature had a large effect on PM2.5estimates,while nighttime lighting index and population density had a small effect on PM2.5 estimates.(3)The extreme random tree model is more suitable for the simulation of daily PM2.5concentration in Beijing,Tianjin,Hebei,Shandong and Henan provinces than the other24 machine learning methods and linear regression methods in automatic machine learning,and the simulation accuracy is higher and the error is smaller.(4)In 2020,the five provinces of Beijing,Tianjin,Hebei,Shandong and Henan are more seriously polluted areas mainly in southern Hebei Province,northern Henan Province and western Shandong Province;The Low PM2.5 concentrations are mainly found in Zhangjiakou and Chengde cities in northern Hebei Province and Weihai City in Shandong Province;The spatial distribution characteristics show the overall trend of high inland and low coastal,high central and low surrounding,high south and low north.(5)The most serious PM2.5 pollution in the five provinces of Beijing,Tianjin,Hebei,Shandong and Henan in 2020 is in January,and the lightest PM2.5 pollution is in August,mainly showing the characteristics of time changes of high in winter and low in summer,with spring and autumn in the middle;The maximum PM2.5 concentration in the non-heating season is 51.31μg/m3,which is located in Anyang City,Henan Province,and the minimum value is 11.26μg/m3,which is located in Chengde City,Hebei Province.Pollution is more serious during the heating season,with PM2.5 concentrations exceeding the national secondary standard accounting for 36.17%of the total number of days in the heating season,while PM2.5 concentrations in the non-heating season have all reached the national secondary standard. |