| Since 2011,maize has become the crop with the largest sown area and highest grain yield in China.The Loess Plateau is one of dominant maize producing area in China due to its abundant photo-thermal conditions.However,the non-uniform distribution of precipitation resulted in high-frequency drought,which was the main limitation for maize production in this region.Hence,accurate and real-time in-season crop yield predictions are crucial for policy making to prevent meteorological disasters and guarantee regional food security.There were two main strategies for crop yield predictions,including mechanism-based method and statistics-based method,and most of these methods were driven by global solar radiation(Rs).However,weather stations of direct Rs measurement only covered few locations due to the high cost of establishing and maintaining measurement equipment.The mechanism-based cropping system models(CSMs)were widely-used in crop yield predictions.However,CSMs provided unsatisfactory simulations and predictions when crops suffered from droght and other stresses.Additionally,statistics-based methods were unable to provide explainations for their prediction.There was still a knowledge gap of combining the mechanism-based methods and statistics-based methods to improve the ability of crop yield predictions.In this study,a framework for dynamic in-season yield predictions was established for maize production based on an ensemble of multi-source data,including machine learning algorithm,crop growth simulation model,remote-sensing-based vegetation index,drought index,and meteorological data.In this study,the Angstrom-Prescott(A-P)formula was employed to deal with the missing of daily global solar radiation(Rs)in the Loess Plateau in China.The empirical coefficients a and b of the A-P formula were first estimated on different time scales based on machine learning algorithms for the whole China mainland.Next,sun-induced chlorophyll fluorescence(SIF)and self-calibrated Palmer drought index(sc PDSI)were used to monitor and evaluate drought in vegetation growing seasons in the Loess Plateau in 2006–2020.Finally,two dynamic within-season yield prediction methods were established based on CERES-Maize model in DSSAT(Decision Support System for Agrotechnology Transfer)and the random forest algorithm,respectively.Finally,these two prediction frameworks were further integrated to develop a comprehensive yield prediction system that was able to provide reliable maize yield predictions across the whole maize growing seasons in the Loess Plateau.Some main contents and conclusions were drawn as follows.(1)There were obvious spatio-temporal variations in the A-P formula coefficients in China mainland.The mean values of the coefficient a and b were 0.16–0.23 and 0.53–0.58 in different climatic zones.Additionally,variations of the coefficients at daily scale were lager than those at monthly and yearly scales,which resulted in the largest estimation errors of Rs.Compared with the daily A-P coefficients,the accuracy of Rs estimation was improved with the monthly and yearly(including the FAO-recommended values)A-P coefficients.However,the FAO-recommended coefficient values overestimated the coefficient a but underestimated the coefficient b in most regions of China mainland.Finally,the monthly A-P coefficients only slightly outperformed yearly coefficients,but greatly increased the difficulty in daily Rs estimations in all of the four climatic zones of China mainland.Hence,the yearly A-P coefficients estimated in this study were recommended in future Rs estimations in China.(2)The correlation coefficient between SIF and sc PDSI was affected by the types and growth stages of the plants in the Loess Plateau.The dynamics of monthly SIF and sc PDSI indicated that photosynthesis was improved while drought was lightened in the Loess Plateau in 2006–2020.Spatio-temporal variations were also observed for the correlation coefficient(r)between SIF and sc PDSI.Significant relationships were obtained in early growing seasons for an area proportion of about 50%.Furthermore,the largest correlation coefficient and the corresponding lag time,which indicated different drought resistant ability,varied among different vegetation type.Generally,trees had the highest drought resistant ability,followed by farmland,bush,and grassland.Additionally,vegetations in arid and semi-arid regions were more sensitive to drought than those in semi-humid regions in the Loess Plateau.(3)Acceptable predictions of maize yield could be provided based on the CERES-Maize model drived by the combination of real-time and historical weather data after maize tasseling.Errors of daily yield predictions were large before maize tasseling,which was usually 50-60 d before maturity for maize in the Loess Plateau.Satisfactory yield predictions could be obtained after the stage of tasseling.For example,at the Yulin site of Shaanxi Province in 2010,the mean absolute relative error(ARE)and coefficient of variation(CV)of daily yield predictions were 23.8%and 20.2%before tasseling and 6.6%and 5.7%after tasseling,respectively.Moreover,two strategies were established to select the analogue weather years to improve the accuracy of daily yield predictions.For the strategy using different leading years before the growing season,the most reliable predictions were obtained by the weather data from the 10 years ahead of the year of maize planting,with an overall average ARE of 11.7%.For the strategy of analogue year selection based on the k-NN algorithm,the most reliable predictions were obtained by the analogue weather selected with only accumulative precipitation,with an overall average ARE of 11.5%.Generally,both of the two optimal strategies outperformed the original predictions that used all of the 50-year local weather data in most cases.However,unsatisfactory yield predictions were provided with the analogue weather data selected by the k-NN algorithm due to limited measured data of meteorological variables in early growing seasons.Generally,we recommended the 10-year leading weather data for in-season maize yield predictions in the Loess Plateau.(4)The random forest models based on multiple datasets could provide stable predictions for maize yield across different growth stages.A random forest model was developed for in-season maize yield predictions based on input variables of EVI(enhanced vegetation index),sc PDSI,SIF,and multiple weather variables at four different stages during maize growing season,including the three-leaf,jointing,tasseling,and maturity stages.Compared with the poor predictions provided by the CERES-Maize model in the early stage of growing seasons,the random forest model obtained stable performance across the whole growing seasons.The RMSE values of yield predictions ranged from 620–720 kg ha-1 at the four growth stages.In general,yield prediction errors decreased with as maize growth stages advanced.However,the prediction trends varied among different predictors.Additionally,the prediction accuracy was not improved with the increasing number of the model input variables.The optimal combinations of input variables were EVI,EVI,SIF,and Climate variable+SIF at the three-leaf,jointing,tasseling,and maturity stages,respectively.(5)The integrated framework,which was established with the yield predicted with the CERES-Maize model,the remote-sensing vegetation indexes,and the random forest algorithm,could further improve prediction accuracy across maize growing seasons.The integrated framework inherited the stability from the random forest model and the dynamic prediction ability form the CERES-Maize model for maize yield predictions in the Loess Plateau.Compared with the original predictions,the integrated model improved prediction accuracies at all of the four growth stages during maize growing season.The R2 values were all greater than 0.9,and ranges of RMSE and n RMSE were 255–463 kg ha-1and 3.5%–6.3%,respectively.Additionally,prediction accuracies were improved on different dates between tasseling and maturity stages by incorporating daily yield predictions and stage values of EVI,SIF and sc PDSI.The result indicated that the integrated framework could increase the timeliness of maize yield predictions by overcoming the problem of discontinuous observations of remote-sensing-based vegetation indices and drought indices.Generally,the integrated maize yield prediction framework showed a great potential for providing stable and high-quality predictions within maize growing seasons in the Loess Plateau. |