| Soil is the basis of life and biodiversity.Soil pH is one of the basic properties of soil.It plays an important role in soil acid and alkalization control,promoting sustainable development of agriculture,improving economic efficiency,ensuring food quality and safety,and coping with climate change.In this paper,Anhui,Henan,Jiangsu and Shandong provinces were selected as the research areas,and 17 environmental variables such as climate,topography and biology were combined with soil sampling point data to form a soil-environment data set.The data set was divided into training set and verification set in different ways.The spatial prediction model of surface soil pH in the study area was established by using machine learning model and model fusion method,and the accuracy of the model was compared and analyzed.The uncertainty of the mapping results of XGBoost and RF model was estimated.Based on XGBoost and RF models,the effects of environmental variables on soil pH attributes were studied.The main results are as follows:(1)The distribution range of surface soil pH in the study area was 7.10±1.09,with a moderate variation level.The data showed a left-leaning low-peak distribution and approximately obeyed the normal distribution.Among the 599 soil samples,alkaline samples(pH>7.5)accounted for 43.4%of the total samples,and acidic samples(pH<6.5)accounted for 28.9%.Alkaline samples are more than acid samples and alkaline samples are concentrated in the northern part of Henan Province;acid samples are concentrated in the southern part of Anhui Province.Correlation analysis showed that soil pH was significantly positively correlated with TWI and MRRTF(p<0.01).It was significantly negatively correlated with Slope,EVI,NDVI,MAT,MAP and Aspect(p<0.01).It was significantly negatively correlated with SPI(p<0.05).The correlation between MAP and soil pH was the strongest,and the correlation coefficient was-0.60(p<0.01).(2)The R~2 of XGBoost and RF model training set and verification set are all above0.5,and the CCC of verification set is around 0.7,so the model accuracy is high.The training set R~2 of the SVM model is extremely high,and the verification set R~2 is mostly below 0.5.There is an over-fitting problem in the model.The overall R~2 of the GLM model is low,and the verification set R~2 is below 0.5.The change of eta has the greatest impact on the XGBoost model;mtry has a strong influence on the modeling accuracy of RF model.Gamma has a strong influence on the modeling accuracy of SVM model.The accuracy of the machine learning model is close to that of the whole environment variable set and the Boruta environment variable set,and the model accuracy is the worst on the principal component environment variable set.The stability of the model in data set 2 is better than that in data set 1.(3)The model fusion accuracy is higher under the whole environment variable set and Boruta environment variable set.The average R~2 of Stacking model training set is0.587 and 0.590.The average R~2 of the validation set is 0.562 and 0.564.The accuracy of the model under the principal component environment variable set is the lowest.The average R~2 of the model training set and the validation set is 0.493 and 0.498.The accuracy of Stacking1,Stacking2,Stacking6 and Stacking7 models is higher.For the same base learner and different meta-learners,the model is more stable when the meta-learner is GLM.Increasing the number of base learners can improve the accuracy of the model.The law of Blending model is similar to that of Stacking model,but the accuracy of Stacking model is slightly better than that of Blending model.The model fusion method is similar to XGBoost and RF models in accuracy and stability.Compared with SVM and GLM models,the accuracy is significantly improved.(4)X,Y,MAP and MRVBF had important effects on soil pH modeling,and MAT and Slope had effects on soil pH modeling.In the XGBoost model,the importance of X,Y and MAP ranks the top three,MRVBF and Slope.TWI is behind;the top six environmental variables in the RF model are X,Y,MAT,MAP,MRVBF,and MRRTF.The lower and upper limits of the uncertainty prediction of the model have similar spatial patterns to the optimal prediction results.The feature screening algorithm has a great influence on the uncertainty results of the XGBoost model.The RF model is less affected by this and the uncertainty results show that:in the study area,the uncertainty difference in most areas is between 2 and 3.The model uncertainty difference is the smallest in the north of Henan Province,the east and west of the mountain,the north and the middle and east of Jiangsu.The uncertainty difference in the central part of the study area and the western part of Henan is between 3 and 4.(5)The mapping results showed that the soil in the study area changed from acidic to alkaline from south to north,showing a spatial distribution pattern of“south acid and north alkali”.The neutral soil area is the largest,accounting for about half of the study area,followed by alkaline,acidic,and strongly acidic.The strongly alkaline soil accounts for the lowest proportion,less than 1%of the study area.In the study area,the soil pH was the highest in the central and northern regions of Henan(7.5<pH<9.18);the soil pH was the lowest(4.20<pH<6.5)in Dabie Mountains and southern Anhui.Figure[27]Table[20]Reference[96]... |