| Landslide is one of the most important geological disasters in China,which seriously threatens people’s lives and properties.Landslide susceptibility assessment is to predict the probability of future landslide occurrence in space.At present,machine learning has become a main tool for landslide susceptibility assessment due to its superior performance of fitting nonlinear structures.However,due to the complexity of landslide occurrence mechanism,the theory and technology of landslide susceptibility assessment based on machine learning still suffer from the following problems:(1)landslides are generally the result of the coupling of multiple environmental factors,and most machine learning models ignore the influence of such environmental factor interactions on landslide susceptibility;(2)the current selection of non-landslide samples is highly subjective,which seriously affects the accuracy of susceptibility models and leads to less objective assessment results;and(3)the non-landslide samples required for susceptibility assessment are highly random,leading to a certain degree of uncertainty in the established susceptibility map,which occasionally differs from the actual situations,thus affecting its engineering applications.To this end,this thesis takes Anhua County,Hunan Province,as a study area,and aims to systematically construct a system of environmental factors affecting landslide occurrence,establish a machine learning model for landslide susceptibility analysis considering the interactions of factors,reveal the influence of the ratio of landslide and non-landslide samples on the machine learning-based landslide susceptibility models,and scientifically quantify the uncertainties associated with the traditional landslide susceptibility maps.The outcome of this study will be beneficial for landslide disaster prevention and mitigation.The main research contents and results are summarized as follows:(1)After analyzing the mechanism of landslides in the study area,a total of 15 environmental factors such as elevation,slope direction,slope and stratigraphic lithology are firstly identified.Then,the environmental factor system for landslide susceptibility assessment is constructed by analyzing and screening the factors with the Pearson correlation coefficient and variance inflation factor.The results of data analysis showed that all the selected 15 environmental factors have certain influence on the occurrence of landslides in the study area,and there is no significant linear correlation and multicollinearity among these factors,which proved the reasonableness of the established environmental factor system.(2)A new landslide susceptibility analysis model based on the attentional factorization machines(AFM)is proposed to consider the interactions of different environmental factors.The proposed AFM model is then compared with the commonly used random forest and logistic regression model to verify its rationality.Meanwhile,the effect of multicollinearity among different factors on the accuracy of the AFM model is explored by using a comparative experiment.The results show that the AFM model gives the best assessment results and is able to calculate the weights of each combination of factors.Hence,the AFM model can better explain the intrinsic causes of landslides in this study area.In addition,the AFM model is more sensitive to the multicollinearity of environmental factors.(3)A general framework for optimizing the ratio of landslide and nonlandslide sample ratios is proposed based on the Bayesian optimization algorithm.The proposed framework is used in combination with multiple machine learning models and different training and test set ratios to verify its effectiveness.The results show that the framework can optimize the ratio of landslide to non-landslide samples for multiple machine learning models and improve the prediction performance of the models to different degrees.It is suggested that,in the case of imbalanced sample data,an integrated learning model with better adaptability and stable performance,such as random forest,should be selected.In addition,it is found that an increase in the ratio of training set to test set will lead to an increase of the area under ROC curve(AUC)of the model.(4)Given the randomness of sampling non-landslide samples,the traditional landslide susceptibility map is uncertain,which is verified by randomly comparing two arbitrary susceptibility maps.In view of this,a repeated buffer-controlled sampling method is proposed to generate the confidence maps of landslide susceptibility to quantify the uncertainty in the traditional maps.Frequency ratios are used to evaluate the accuracy of the zoning results of the traditional susceptibility maps and the proposed confidence maps.Four sets of buffer distances are considered to investigate the influence of the buffer distances on the evaluation results of susceptibility.The results show that there is a high uncertainty in the traditional susceptibility zoning map,and the proposed confidence map can effectively quantify the corresponding uncertainty.The zoning results of the proposed confidence map are more accurate than those of the traditional susceptibility zoning map.The choice of buffer distance can affect the results of susceptibility evaluation,and the AUC value of the underlying model increases with the increase of the buffer distance. |