Font Size: a A A

Comparison Of Digital Soil Mapping Methods Based On Feature Selection And Different Machine Learning

Posted on:2021-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:H L LuFull Text:PDF
GTID:2370330605956904Subject:Cartography and Geographic Information Engineering
Abstract/Summary:PDF Full Text Request
Soil pH affects the physical,chemical,and biological processes of the soil.The fertility levels,microbial and fauna activity,C/N ratio,and humus formation of the soil are all closely related to soil pH.Predicting the spatial distribution of soil pH and performing digital soil mapping are of great significance for soil quality monitoring and zoning management.This study collected soil properties data in Anhui Province,took soil pH as the research object,and used GIS method to extract environmental variables such as terrain,vegetation index and climate.First,this paper analyzes the influence of environmental variables on soil pH.Secondly,we use boruta algorithm,recursive feature elimination,simulated annealing feature selection,filtered feature selection and principal component analysis to carry out feature mining on environmental variables to obtain the optimal combination of environmental variables.Based on the results of feature mining,several different machine learning methods including random forest,support vector regression,gradient boosting decision tree model and deep neural network were used to establish the spatial prediction model of soil pH in Anhui Province.Finally,we compare and analyze the performance of the combination of different feature mining methods combined with machine learning models in prediction accuracy,and study the tuning of model parameters.The research results provide a reference for how to select environmental variables and combine machine learning methods to conduct digital soil mapping research in large areas.The main results are as follows:(1)All the several different feature mining methods can reduce the number of original environment variables to a certain extent,thereby achieving the effect of dimensionality reduction and removing redundancy.Compared with the modeling using original environmental variables,the machine learning models based on different feature mining methods can improve the spatial prediction accuracy of soil pH and the effect of digital soil mapping in Anhui Province to a certain extent.Among several feature mining methods,Boruta algorithm can rank the importance of environment variables,and the importance of variables from high to low is X,MAP,MrVBF,MrRTF,MAT,EVI,Elevation,Slope,TWI,Y and NDVI.The optimal feature combination obtained by recursive feature elimination is MAP,X,MRVBF,MRRTF,MAT,EVI,Elevation,a total of 7 features.The optimal feature combination obtained by the simulated annealing feature selection is X,MRTTF,Plan,Profile,Elevation,MAT,a total of 6 features.The optimal feature combination obtained by filtering feature selection is X,Y,EVI,NDVI,MRVBF,MRTRF,TWI,Plan,Slope,Elevation,MAP,a total of 11 features.The number of principal components obtained by principal component analysis is 5.Several kinds of feature mining results indicate that it is necessary to perform feature mining before digital soil mapping.(2)All the modeling results of different machine learning models have high accuracy.Several models have their own advantages in different aspects.From the training set,the gradient boosting decision tree model is the optimal model(RMSE=0.32,MAE=0.23,R2=0.93),but there is an overfitting problem,and the model stability is the lowest.From the prediction accuracy aspect,random forest is the best model(RMSE=0.48,MAE=0.57,R2=0.77).From the perspective of model stability,support Vector regression is the optimal model,the difference between the training set and the validation set R2 is 0.04,which is lower than other models.Considering the prediction accuracy and model stability,the deep neural network is the optimal model.From the perspective of mapping,the spatial distribution of soil pH predicted by all models is roughly the same,and the distribution of the predicted results is basically the same as the original value,showing a "South acid north alkali trend".The research results show that the use of machine learning methods for digital soil mapping has certain research significance.(3)The parameters of several models have different degrees of influence on the accuracy.The main parameters of the random forest model,ntree and mtry,have a lower degree of influence on the model,and the parameter adjustment is relatively simple.The default values are often not used to affect the accuracy of the model.For support vector regression model,whether it is a single parameter of gamma and cost,or a combination of both have a greater impact on the accuracy of the model,so you need to adjust the parameters during modeling.Several parameters of the gradient lifting regression model have a greater impact on the final prediction accuracy.Different parameter combination models have a greater impact on the model,so grid search is required to adjust the parameters.DNN models have too many parameters,so how to adjust the parameter is a very complicated problem,and the grid search method can also be used to adjust the parameters when the calculation conditions permit.Figure[26]table[14]reference[62]...
Keywords/Search Tags:Soil pH, Environmental variables, Feature mining, Machine learning, Model parameters, Digital soil mapping
PDF Full Text Request
Related items