| The precipitation retrieval algorithm by remote sensing based on geostationary satellite infrared data is widely used in various fields of meteorological business and research.However,due to the imbalance of samples,the precipitation intensity simulated by quantitative precipitation estimation model based on machine learning often has the problem of underestimation,and the accuracy for moderate rain area and heavy rain area is not high,and it is almost impossible to identify the heavy rain area.Based on the observation of Himawari-8geostationary satellite and GFS(Global Forecasting System)numerical forecast product data in summer of East Asia,the machine learning algorithm of random forest and deep forest are used to establish precipitation retrieval models to directly retrieve the precipitation grade,to improve the ability to identify moderate rain and heavy rain.The multi-satellite precipitation estimation value of GPM(Global Precipitation Measurement)is taken as the target variable of model training,and a two-step precipitation retrieval model is constructed by identifying the rainfall area and estimating the precipitation grade on this basis.Then the simulation results of the model are compared with the observation data of ground weather stations.The main conclusions of this study are as follows:(1)In order to reduce the influence of imbalance of precipitation samples,this paper uses resampling technology to change the distribution of sample data used for model training,so as to improve the accuracy of a few samples in the model identification data set.Based on GPM multi-satellite precipitation products,the retrieval models of random forest and deep forest machine learning algorithms are tested respectively.The results show that the accuracy of estimating precipitation level in the deep forest model on precipitation events can reach 0.78,while the accuracy of the random forest model is 0.70.The performance of the deep forest model on the verification dataset is better than that of the random forest model.Both algorithms are integrated algorithms based on decision tree,which have stable learning performance and are not easily affected by over-fitting.(2)In this study,the model is applied to the test dataset in different periods from the training dataset to further evaluate the performance of the two models.It is found that both models show high probability of detection and false alarm rate in the afternoon when convective precipitation is easy to occur,and their performance on different test datasets have little difference.The precipitation pixels simulated by random forest model can well describe the outline of the precipitation area of satellite products,but there is a certain degree of overestimation,and for some areas with dense distribution of heavy rain samples,a larger range of heavy rain areas will be simulated.Based on the inspection of ground stations observation,deep forest model can better identify the rain belt actually observed by the weather station.The overestimation degree of precipitation identified by deep forest model is lower than that of random forest.Both models can identify some moderate rain areas and heavy rain areas,the performance of deep forest model about moderate rain and heavy rain area is better.Compared with machine learning model which directly retrieve precipitation intensity,the ability of the model to identify moderate rain and heavy rain is improved.(3)The study discusses the phenomenon that the contribution of GFS meteorological variables in the training process of this research model is higher than that of satellite infrared brightness temperature data.The accuracy of machine learning algorithm is improved compared with numerical forecast data,which reveals the important role of satellite infrared brightness temperature data in machine learning model.The satellite multispectral data and cloud parameter information can better reflect precipitation intensity information,thus improving the accuracy of precipitation identification by the model. |