| Atmospheric particulate matters related to haze pollution has aroused worldwide concern for the past decades.An accurate monitoring of haze related to air quality indicators such as aerosol optical depth(AOD)and concentrations of particulate matters(PMx)like PM2.5 and PM10 is thus of critical importance.In general,AOD and PMxconcentration can be acquired from ground monitors,chemical transport models,and satellite-based retrievals.Although ground monitors can provide accurate gaugements with high-frequency,it is challenging to provide worldwide measurements due to the sparse and uneven distribution of monitoring sites in space.Although chemical transport models can provide global simulations of atmospheric aerosols and particulate matters,the coarse resolution renders numerical simultions large uncertainty at local scales.Given extensive spatial coverage and high accuracy,satellite-derived AOD data have been extensively applied to estimate surface PMx concentrations,which has also become the principal and popular way to resolve spatial distribution of PMx pollutions.Nevertheless,extensive data gaps due to cloud cover and bright surface in AOD retrievals significantly limit the application potential of such datasets for generating spatially contiguous PMx concentrations.Under the situation of“data exploded but poor knowledge”in remote sensing,it is advisable to generate high-end datasets on the basis of versatile remotely sensed raw data to to better support haze pollution management and the corresponding health risk assessment.In this study,a big data analytic approach has been developed to integrate multi-modal AOD and related data acquired from distinct satellites,numerical models and ground monitors aiming at geneating a long-term coherent and spatially contiguous AOD and PMx datasets at 1-km spatial and daily temporal resolution in China since2000.Toward this goal,a variety of methods such as machine learning,missing value imputation,data fusion and assimilation were seamlessly integrated to mine embedded autocorrelation and complementary effects between different data sources in space and time.By taking advantage of these gridded datasets,long-term trends of haze pollution and the related population exposure risks in China during the past 2 decades were evaluated.The main findings and conclusions of this study are summarized as follows:(1)Firstly,a diurnal cycle constrained matrix completion method was developed to merge hourly AOD retrievals from Himawari-8 during the daytime to fill in AOD data gaps in different hours by taking advantage of the empirical orthogonal function method.This method enables to maximize spatial coverage of daytime AOD observations as data gaps at one grid in one specific hour which can be properly reconstructed on the basis of available retrievals during other hours in reference to the diurnal cycle derived from neighboring observations.Spatially contiguous AOD maps were finally derived by fusing this partially gap-filled dataset with the bias corrected MERRA-2 AOD simulations as well as AOD estimates derived from in-situ air pollutants concentration measurements.The random forest method was applied for downscaling MERRA-2 AOD simulations and ground-based AOD estimations.Ground-based validation results indicated that the fused AOD dataset had higher accuracy than original satellite-based AOD retrievals,with R of 0.82 and RMSE of 0.24.Such a high temporal resolution AOD dataset is in favor of better monitoring the life cycle and variations of dust and haze pollutions in space and time.(2)Aiming at providing actional guidances on how to generate gap-free PMxdatasets on the basis of remotely sensed data,a variety of popular AOD and PMx data manipulation schemes toward the generation of full-coverage PM2.5 concentration maps were compared,by taking PM2.5 for illustration.The results suggest that the optimal mapping scheme is to derive PM2.5 concentration from gap-free AOD maps with in-situ PM2.5 concentration measurements fused with gridded PM2.5 estimates in the following.Also,the results revealed the drawbacks of including PM2.5 measurements from neighboring sites as predictors in PM2.5 estimation model since it could result in biased machine learning model due to significant autocorrelation of PM2.5 in space.Moreover,the spatial distribution of PM2.5 estimates are highly associated with parameter configuration when modeling spatial autocorrelation terms,especially over regions with limited monitoring sites.(3)A Long-term Gap-free High-resolution Air Pollutant(LGHAP)concentration dataset covering AOD,PM2.5,and PM10 was finally generated by integrating AOD and related data acquired from multiple polar-orbiting satellites,numeric simulations,and ground monitors in China with daily 1km resolution for the period of 2000 to 2020.The ground-based validation results indicate good agreements between LGHAP dataset and ground measurements,with R of 0.91,0.95 and 0.94 and RMSE of 0.21,12.03μg/m3and 19.56μg/m3 for AOD,PM2.5,and PM10,respectively.(4)By taking advantage of the LGHAP dataset,the long-term variability of haze pollution and population exposure risks in China since 2000 were assessed.The results indicated that there were three typical periods for haze pollutions in China,namely,the increasing period during 2000–2007,the transistion period of 2008–2013,and the reduction period after 2014.Meanwhile,the overall haze pollution level in 2018 was found to be comparable to the level of 2000,demonstrating the effectiveness and success of the implemented clean air actions on haze related to emission reduction.Nonetheless,more than one-third of the population was found still exposing to harmful haze pollution,underscoring the critical importance of continuing to reduce haze related emissions in China. |