Geological hazards are an important type of natural disaster,and all countries in the world have various types of geological hazards in varying degrees.China,with its vast territory,undulating terrain and a wide variety of landform types and natural landscapes,is one of the countries with the most serious geological hazards in the world.Every year,geological disasters such as landslides,collapses,debris flows and ground subsidence occur.Numerous studies have shown that rainfall is one of the most important active factors inducing geological hazards.Secondly,with the influence of climate change,social and economic development and the intensification of human activities,the frequency of various geological hazards is on the rise,bringing serious negative impacts on people’s lives and economic development.With the further study of non-linear dynamic systems of landslides and environmental factors,data-driven methods represented by machine learning have become a current research hotspot.Landslide susceptibility assessment is the main way to characterise the spatial probability of landslide occurrence.However,there is randomness and uncertainty in the random sampling non-landslide unit selection method and evaluation model selection.Meanwhile,the critical rainfall threshold for rainfall-induced landslides focuses on the temporal probability of landslide occurrence,but the existing critical rainfall threshold models for predicting the temporal probability of rainfall-induced landslides have a high false alarm rate.Therefore,it is of practical significance to carry out research on the risk assessment,early warning technology and management countermeasures of rainfall-induced landslide,so as to provide scientific basis and technical support for disaster prevention,disaster mitigation and plan preparation.In view of the above problems,this paper takes Huangpu District of Guangzhou City as the research area and conducts a study on the evaluation of landslide susceptibility and critical rainfall threshold based on machine learning.The main research contents and findings are as follows:(1)Combining basic geological data and field exploration data to summarize the characteristics of landslide development.The factors extracted in this paper include the elevation,slope,slope shape,surface relief,lithology,ground cover and average annual rainfall of the landslide site,etc.A total of 16 factors are extracted.Landslide susceptibility conditions in the study area were characterised by quantitative calculations of landslide density,information value and frequency ratios for each contributing factor.(2)Treating the landslide susceptibility modeling process as positive and unlabeled(PU)learning,a two-step convolutional neural network framework combining the information value model and Spy techniques is proposed and applied to screen high-confidence non-landslide samples.The results show that compared with random sampling and traditional Spy algorithms,the ROC accuracy of the machine learning model using ISpy is improved by approximately 1.9% to 3.1% and 5.4% to5.8% respectively.The noise data of the reliable negative example samples screened based on the improved ISpy sampling strategy is reduced.The purity of the reliable negative example samples is improved and the classification effect is further improved.(3)Through building landslide susceptibility evaluation models base on support vector machine(SVM),random forest(RF)and convolutional neural network(CNN),the better landslide susceptibility evaluation process and methods were compared.The results show that the frequency ratio accuracy of the second step of the PU learning of landslide susceptibility using CNN model are higher than those of RF and SVM models.The distribution of landslide susceptibility areas is more reasonable.The model accuracy increased as a power function with the increasing number of training samples.The sample set screened by the ISpy framework showed higher stability,prediction accuracy and growth rate compared with random sampling and traditional Spy techniques by increasing the same number of training samples.(4)The landslide event cumulative rainfall-duration of rainfall(E-D)threshold,normalized landslide event cumulative rainfall-duration of rainfall(EMAP-D)threshold,the E-D threshold based on stratigraphic lithology classification were calculated by quantile regression method respectively.By analysing the relationship between the daily rainfall on the day of landslide failure and the previous 20 days cumulative rainfall in different time periods,a three-dimensional critical rainfall threshold assembled with the previous cumulative rainfall is proposed.The comparison shows that it can significantly reduce the false alarm rate due to non-triggered landslide rainfall events and has a superior ROC performance. |