| Background:Over the past 50 years,global air pollution has become increasingly serious,with the concentration of particulate matter(PM)rising by more than 38 percent.Developing countries such as China and India have gradually become the main sources of air pollution,which has aroused the attention and concern from researchers,the public and governments in developing countries.Therefore,the demand for real-time and accurate monitoring of ambient air quality becomes ever more urgent.However,the cost of establishing and maintaining ground monitoring sites is relatively high,which makes the available air pollution monitoring data relatively scarce.Therefore,researchers began to propose the use of satellite-retrieved aerosol optical depth(AOD)products to estimate the concentration of near-surface particulate matter(PM).This idea is a hot topic for global research,but there are a series of problems remaining to be solved.First of all,retrieved AOD data is facing a common,serious and urgent problem--over 70%to 90%non-random missing,which seriously affects the accuracy of air pollution exposure level estimation and further health effect evaluation.Secondly,in the larger space-timescale,the method based on traditional parameter model is sometimes difficult to capture the complex space-time heterogeneity,result in the poorer model performance.Furthermore,how to accurately evaluate the model performance after constructing models is another critically important but thorny issue,because some studies have reported that the traditional cross-validation may be biased due to the uneven distribution and excessively sparse sites.Objective:The purpose of this study is mainly to provide a feasible and effective solution for a series of problems existing in estimating the ground-level concentration of fine particulate matter by AOD products,which mainly covers three aspects:highly non-randommissing problems in AOD data,the complex spatio-temporal heterogeneity problems in AOD model and exploring the relationship between the adjacent site quantity and model performance.Methods:In order to solve multiple difficulties in the process of AOD estimating PM,we proposed a two-step interpolation method(i.e.preliminary interpolation with mixed effect model and quadratic interpolation with inverse distance weighting)to alleviate the problem of high missing rate of AOD products,and higher-coverage and higher-accuracy daily PM2.5 concentration level would be estimated by the combined machine learning method(the joint of Extreme gradient boosting(XGBoost)technique and nonlinear exposure lag model(NELRM).In the process of model evaluation,two-stage meta analysis(stage 1:exploration of different partitions;stage 2:merging the effects of each partition)was utilized to explore the relationship between the number of adjacent sites and the extrapolation of the final model.Results:Our study found that the AOD missing rate in China was as high as 87.9%.After the two-stage interpolation,the missing rate was effectively reduced to 13.83%,and the accuracy of interpolation was guaranteed(CV R-square was 0.76).Compared with nonlinear exposure lag model(NELRM),the combined method had improved its performance over 56%nationally(CV R-square rose from 0.55 to 0.86),and cross-validation RMSE was shrunken down nearly to the half(from 26.80μg/m3 down to 14.98μg/m3).In the two-stage meta analysis method,the estimated performance of all raster estimation in nationwide was 11.35%lower than that of the traditional cross-validation results(95%CI:0.34-22.37%)(i.e.,the prediction accuracy(Spatial CV R-square)decreased from 0.84 to 0.74(95%CI:0.65-0.83)).Conclusion:Althougha series of problems were existed in remote sensing PM2.5 estimation,this study provides a succession of solutions to these problems.First,two-step interpolation method is put forward to ameliorate the missing problem of AOD products,and the combined machine learning method is to estimate the high-coverage and high-accuracy daily average PM2.5 levels,it can help us further assess the population average exposure levels of air pollution or the individual exposure levelsmore accurately.In addition,our two-stage meta analysis help us further explore the relationship between the number of neighboring sites and the extrapolation performance of the model,so that the overestimation problem in traditional cross validation caused by sample representativenesscould be avoided.And the results provide an important reference for how to further improve the accuracy of air pollution monitoring in the future. |