Font Size: a A A

Research On Remote Sensing Estimation Of Polluting Gases By Ensemble Learning Considering Local And Global Information

Posted on:2024-06-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:1521307292460004Subject:Photogrammetry and Remote Sensing
Abstract/Summary:PDF Full Text Request
Human health and its associated environment are closely affected by the level of near-surface polluting gas.The fusion method of remote sensing observation and ground-based measurement is an important development direction to accurately monitor near-surface polluting gas.With plenty of highlights,the machine learning algorithm is gradually regarded as the major approach for remote sensing estimation of near-surface polluting gas concentrations.In particular,the ensemble learning algorithm has become the principal model in this field,with its powerful ability to portray complex nonlinear relationships.However,previous works usually present the estimated results with missing data,rarely consider the spatial heterogeneity of near-surface polluting gas,adopt one sub-model of a simple structure,and ignore the non-uniformed distribution of ground-based stations.As a consequence,this paper develops ensemble learning models considering local and global information,which combine the advantages of local and global nonlinear features with ensemble learning algorithms to achieve high-precision remote sensing estimation of near-surface polluting gas concentrations.This study is arranged as the following four aspects:(1)An ensemble learning method for seamless estimation of near-surface polluting gas concentrations.Previous related works usually ignored the missing coverage in the input data of the model,leading to gaps in the estimated results that cannot present the spatiotemporally continuous information of near-surface polluting gas.Meanwhile,the problems such as coarse spatial resolution and inadequate selection of multi-source nonlinear features also exist.Therefore,this study firstly adopts an Exemplar-based approach to reconstruct the missing data of high-resolution TROPOMI vertical column density;then utilizes multi-source variables to build a model based on Light Gradient Boosting Machine(Light-GBM),i.e.,a new generation in ensemble learning algorithms,with ozone(O3),nitrogen dioxide(NO2),and carbon monoxide(CO)in China as examples.Validation results show that the R2s of Light-GBM in different schemes are0.77-0.91,0.70-0.83,and 0.55-0.71 for O3,NO2 and CO,respectively.The statistical metrics are better than those of previous related works.The estimated results can present spatiotemporally continuous distributions of near-surface O3,NO2,and CO.(2)A self-adaptive geospatially local ensemble learning method for estimation of near-surface polluting gas concentrations.Previous ensemble learning algorithms generally employed all data in modeling,which only adopted the global information and did not fully consider the spatial heterogeneity of near-surface polluting gas.Hence,this study initially integrates local spatial differences in ensemble learning to develop a self-adaptive geospatially local model,i.e.,Self-adaptive Geospatially Local Boosting.The model exploits the simple and fast Categorical Boosting as the sub-model and proposes an adaptive framework to select samples for local modeling based on the geographical location of each station.The developed model is then established across China with O3 as an example,combining the thermal infrared bright temperature from the Himawari-8 and multi-source variables.Validation results show that the model performance is significantly improved by considering the local nonlinear features,which exceeds previous related works,with the R2s of 0.85 for SICV and 0.72 for TESICV.The estimated results can clearly reflect the spatial and temporal variations of near-surface O3 concentrations.(3)A multi-layer cascaded global ensemble learning method for estimation of near-surface polluting gas concentrations.Previous ensemble learning algorithms usually built sub-models with simple decision trees,which are insufficient to mine global nonlinear features at large scales.In addition,the sub-models are all from a single type and fail to explore the differentiated information of features,resulting in the model reliability to be enhanced.Therefore,this study primarily integrates the global information of near-surface polluting gas into ensemble learning algorithm and develops a multi-layer cascaded global model,the Multi-layer Cascaded Forest.This model adopts the Random Forest,Extremely Randomized Trees,and e Xtreme Gradient Boosting as sub-models with complex structures,and connects them via cascaded layers.Afterward,the proposed model is constructed over the globe with CO as an example,using TROPOMI data and multi-source variables.Validation results show that the cascades of multiple sub-models with complex structures can significantly improve the model performance,with the R of 0.73 in TEV.The estimation accuracy is superior to those of common machine learning algorithms and the GEOS-CF CO replay product.The estimated results can accurately capture the spatial patterns of global near-surface CO.(4)A local and global connected ensemble learning method for seamless estimation of near-surface polluting gas concentrations.For modeling tasks over the globe,the multi-layer cascaded global ensemble learning model cannot consider the spatial heterogeneity of near-surface polluting gas.Besides,the distribution of global stations presents significant non-uniformity,which makes it difficult to apply the self-adaptive geospatially local ensemble learning model.At the globe scale,the input data of the model also includes more missing information,leading to gaps in the estimated results of near-surface polluting gas concentrations.To address the above issues,this study initially proposes a local-global connected ensemble learning model,i.e.,the geospatially Local-Global Hybrid Ensemble Forest.This model consists of a total of three modules:local,global,and local-global(geospatially weighted)with the adaptive Light-GBM and MCF as sub-models in local and global regions,respectively.Meanwhile,the spatiotemporal-based reconstruction method is introduced to recover missing pixels in the input data of the model.Finally,the recovered TROPOMI data and multi-source variables are selected to establish the model over the globe with O3 as an example.Validation results show that the connection of local and global spatiotemporally continuous information can break through the bottleneck of existing algorithms,with the R2s of 0.87 and 0.73 in SICV and TESICV,respectively.The model performance and robustness are greatly improved,prevailing common machine learning algorithms and GEOS-CF replay O3 product.The estimated result can accurately display the spatiotemporally continuous variations of near-surface O3 in the globe.
Keywords/Search Tags:Near-surface polluting gas, Remote sensing, Ensemble learning, Multi-source fusion, Local and global
PDF Full Text Request
Related items