| Forest fires are one of the natural disasters that disrupt the balance of forest ecosystems.Building a spatial prediction model for forest fires is an important approach to control and manage these fires.By constructing such a model,we can uncover the mechanisms and risks associated with forest fire occurrences.Furthermore,it allows us to estimate the likelihood of future forest fires in specific regions and perform risk zoning,which contributes to the scientific prevention of forest fires.Currently,data-driven models are commonly used for forest fire spatial prediction,and in the application of data-driven models for forest fire spatial prediction,negative samples(non-fire data)are important training sample data for constructing forest fire spatial prediction models,and the quality of non-fire samples has an important impact on the prediction effect of the model.However,the existing non-fire point sample data sampling method cannot quantify the credibility of the collected non-fire data,and the representativeness of the collected non-fire data is not high enough,resulting in the non-fire sample data generated cannot fully describe the geographical and environmental characteristics of non-fire areas.This reduces the prediction effect of the forest fire spatial prediction model.Therefore,this thesis proposes a non-fire data sampling method for forest fire spatial prediction based on geographical similarity to improve the quality and representativeness of non-fire samples.The core idea is that the less similar the geographical environment of a candidate sample point is to the historical forest fire points,the greater its credibility as a non-fire sample.We chose Yunnan Province,China,Firstly,we constructed a non-fire data sampling model based on the principle of geographic similarity and collected non-fire data with different levels of credibility.We then applied these non-fire data samples with varying levels of credibility to three commonly used forest fire risk prediction models: logistic regression(LR),support vector machine(SVM),and random forest(RF).This was done to assess the impact of different credibility levels of non-fire data on prediction performance.Secondly,we collected non-fire sample data using both traditional sampling methods and the geographically similar sampling method proposed in this study.Next,we combined the datasets sampled using the two non-fire sampling methods mentioned above(non-fire samples)with a known historical forest fire dataset(fire samples)to create the training dataset.We then applied this dataset to the LR,SVM,and RF models for forest fire risk prediction to examine whether the geographically similar non-fire sample data sampling method proposed in this study is more representative and can improve prediction performance.The main research results are as follows:(1)The credibility threshold of the non-fire point sample significantly affects the prediction quality of the forest fire spatial prediction model,that is,the improvement in the prediction accuracy of the forest fire spatial prediction model is optimal at the best confidence threshold.At the same time,the non-fire point sample collection method based on geographic similarity proposed in this article can better quantify the reliability(credibility)of non-fire point sample data.(2)Compared to the traditional non-fire point sampling method,the modeling accuracy of the forest fire spatial prediction model constructed based on the non-fire point sampling method proposed in this paper has significantly improved.Using historical fire data from 2010 and non-fire point data collected using the proposed sampling method,the modeling results of the constructed forest fire occurrence spatial prediction model show that the modeling accuracy of the LR model has increased by22%,SVM by 11.96%,and RF by 15.7%.Similarly,using historical forest fire data from 2020 as the basis for constructing the forest fire occurrence spatial prediction model,the modeling accuracy of the LR model has increased by 16%,SVM by 13.2%,and RF by 10%.(3)Similarly,compared to the traditional non-fire point sample collection method,the prediction accuracy of the forest fire spatial prediction model established based on the non-fire point sample collection method proposed in this article has been significantly improved.For the forest fire spatial prediction model constructed based on the non-fire point data generated by the proposed sampling method and historical forest fire data(fire point data)in 2010,the prediction accuracy of the LR model has increased by 37.6%,SVM has increased by 35.2%,and RF has increased by 25.8%.For the forest fire spatial prediction model constructed based on the non-fire point sample data and historical forest fire data in 2020,the prediction accuracy of the LR model has increased by 27.8%,SVM has increased by 23.2%,and RF has increased by22%.Therefore,based on the above experimental conclusions,we believe that collecting non-fire point samples based on the principle of geographic similarity is an effective way to improve the quality and representativeness of non-fire point samples in forest fire risk prediction. |