Font Size: a A A

Comprehensive Application And Study Of Machine Learning In Susceptibility Evaluation Of Shallow Landslides

Posted on:2022-08-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z LiangFull Text:PDF
GTID:1480306728981419Subject:Civil engineering
Abstract/Summary:PDF Full Text Request
Collapses,landslides and debris flows are the most common natural geological disasters in mountainous areas in China.Due to its high frequency and huge destructive power,the collapses,landslides and debris flows has posed a major threat to the safety of people's lives and property with the development of my country's economy and the progress of urbanization.According to the "National Geological Hazard Bulletin",the total number of geological disasters in the country in 2020 is7,840,including 4,810 landslides,1,797 collapses,and 899 mudslides."Slides and landslides" accounted for 95.7% of the total number of geological disasters,causing huge deaths and economic losses.Professionals predict that the geological disaster prevention and control situation in 2021 will remain severe.Therefore,it is of great practical significance to evaluate collapse.As a key step in disaster prevention and mitigation,susceptibility assessment has been concerned by researchers in various countries for many years.Defined from a broad perspective and the depth of the landslide,most of the common collapses,landslides and debris flows blong to shallow landslides(the depth is less than 10m).Thanks to the development of Remote Sensing(RS),Geographical Information System(GIS),Globe Positioning System(GPS),computer technology and mathematical theories,major breakthroughs have been made in the relevant theories and algorithms for susceptibility evaluation of collapse,landslide and debris flow.However,there are still some problems in practical as the quantity of shallow landslide sample data cannot be controlled,the purity of nonlandslide samples cannot be guaranteed,the model evaluation indicators are simple,how to explore the potential of the occurrence of debris flow and landslide in the region,preparation of comprehensive zoning map of multiple disasters and crossregion application of prediction model etc.The above problems are different from each other but interrelated.In this paper,we select Longzi Town,Longzi County,Shannan City,Yadong County,Xigaze City,and Miyun County,Beijing,three areas with frequent collapses,landslides and debris flows as research objects.The purpose of the research is to improve the collection purity of non-landslide samples,reduce the number of landslide samples required for modeling,broaden the model evaluation system,explore the comprehensive sensitivity zoning map of multiple disasters and cross-validate from different areas of the models.A variety of supervised and unsupervised learning algorithms are used to explore the main hazards of shallow landslides,disaster distribution characteristics,and machine learning models with high prediction accuracy.It is expected to obtain accurate Reliable hazard-pregnancy factors,hazard distribution characteristics of shallow landslides,and machine learning models with high prediction accuracy are the research purposes.The research results of the three study areas are different but closely related,which provide guidance for land planning and utilization,disaster prevention and mitigation,and engineering construction in the study area,as well as reference for similar studies in other plateau and plain areas.The main results of the thesis are as follows:1.This thesis elaborates on the development process of the susceptibility evaluation of collapse,landslide and debris flow,clarifies the related concepts involved in the subject,introduces the collection and processing methods of related data,and indicates the establishment process and evaluation methods of the prediction model.2.A way to improve the purity of non-landslide samples is proposed.Miyun area is taken as the research object,the existing shallow landslide samples in the study area are used as verification indicators,and the shallow landslide susceptibility zoning map generated by fuzzy C-means clustering is used as the initial map.A certain number of non-landslide samples are randomly selected from the lowest susceptibility level area corresponding to the initial map to provide to provide a high-quality complete database for four integrated model as stacking(Stacking),adaptive boosting tree(Ada Boosting-DT,Ada-DT),random forest(Random forest,RF)and gradient boosting tree(Gradient Boosting Decision Tree,GBDT).The results show that the Stacking model has the highest prediction accuracy,with susceptibility,specificity,accuracy and AUC values reaching 91.78%,90.54%,91.16% and 0.944,respectively.In addition,the main hazard-pregnancy factors of shallow landslides in the study area are analyzed based on the combination of Gini index(Gini)and frequency ratio,which supplements the application value of the Stacking algorithm.The predicted high-susceptibility areas are consistent with the field survey results.3.Multiple shallow landslide susceptibility prediction models were established in Miyun District at the same time.Take the prediction accuracy as the core,add the difficulty,complexity,advantages and disadvantages of each model and other multiangle evaluation indicators to broaden the evaluation indicators of machine learning models and form a new model evaluation system.Conduct a comprehensive comparative analysis of a variety of supervised and unsupervised learning algorithms and provide a reference for algorithm selection in other research areas.4.By introducing the idea of anomalous detection and adopting isolated forest,the limitation of landslide sample number on modeling is reduced,which provides a way to alleviate the problem of limited sample number in shallow landslide sensitivity evaluation.Isolation forest is an unsupervised learning algorithm.There is no need to label samples or prepare a special training set during the establishment process.The modeling process involves two main parameters,the sampling size S and the number of trees t.Taking Yadong County as the research object,after optimization,S is 300 and t is 110.The landslide samples in the study area will be used as the main detection data for isolated forests.The prediction accuracy rate is: accuracy=86.96%,specificity=82.19%,AUC=0.917.Compared with another advanced integrated algorithm,Ada-DT performed better in terms of accuracy,specificity and AUC,which were 92.99%,93.22% and 0.988,respectively.In terms of susceptibility,the susceptibility of the isolated forest model is 93.58%,which is better than the Ada-DT model with a susceptibility values of 89.88%.5.The overall prediction accuracy of the Ada-DT model is higher,but the noise caused by non-landslide samples may affect the performance of the binary classifier.The modeling of isolated forest lacks the competition of abnormal samples,and the prediction results may be biased.The high-susceptibility level areas predicted by the two models are basically in line with the situation of the field survey.Therefore,a sampling method for selecting non-landslide samples from the extremely low susceptibility areas predicted by isolated forests is proposed to further improve the purity of non-landslide samples.The results of this study serve as a supplement to the study of landslide susceptibility in Miyun District.The combination of the results of the two study areas improves the quality of the samples required for modeling in supervised learning.Different from most machine learning algorithms,the isolated forest algorithm itself does not limit the number and scale of outliers in the sample data(outliers in this article refer to landslide samples),and even the number of landslide samples required for modeling should not be excessive.Therefore,isolated forests have a good application prospect in the susceptibility evaluation of shallow landslides,and provide solutions to problems such as difficulty in landslide sample collection,limited quantity,and difficulty in guaranteeing purity.6.Taking Longzi Town as the research area,explore the potential relationship between the occurrence of debris flows and landslides in the area.Based on the existing research results,select RF as the modeling algorithm.The watershed unit and slope unit were applied as the mapping unit to establish the debris flow and landslide susceptibility zones respectively.As shown in the figure,the high-susceptibility areas predicted in the two figures are consistent with the on-site investigation.Then,superimpose the corresponding susceptibility levels of the two pictures to explore the potential relationship between the occurrence of debris flow and landslides.The results show that the occurrence of debris flow is not necessarily or directly related to landslides,and the impact of debris flow on landslides is not obvious.Finally,the centroids of all slope units in the study area are transformed into potential disaster points,and they are scattered on the surface of each basin unit.Integrating the respective sensitivities of the two disasters,the study area is divided into three levels of low,medium,and high sensitivity,and a comprehensive zoning map of landslidedebris flow sensitivity is obtained.7.Four models of Stacking,RF,GBDT and Ada-DT were cross-validated in three study areas to explore the generalization ability of different models outside the training area.The results show that the migration stability of the Stacking model is the best,and the GBDT model is the worst.Without the participation of additional training data sets,the model is difficult to achieve the optimal in the process of crossregional application,but the similarity of the geological background will facilitate the transplantation of the model.
Keywords/Search Tags:machine learning, landslide, debris flow, susceptibility mapping, GIS
PDF Full Text Request
Related items