Font Size: a A A

Research On Soil Sample Representativeness Correction Based On Kernel Density Estimation

Posted on:2024-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ChenFull Text:PDF
GTID:2542307160977999Subject:Agricultural engineering and information technology
Abstract/Summary:PDF Full Text Request
Soil is an important component in maintaining material cycling and energy flow within ecosystems and is a crucial resource for sustaining human production and livelihoods.Accurately obtaining spatial distribution information on soil resources is beneficial for providing scientific basis for land use and planning,and it is beneficial for providing data support to land resource evaluation and management.With the development of computer science and 3S technology,digital soil mapping has emerged as a new soil survey method,which constructs soil landscape models based on the soil-forming factor theory and the first law of geography.The establishment of the co-variation relationship between soil and environmental factors is critical to digital soil mapping,which requires reliable soil knowledge obtained from on-site sampling points.However,due to various reasons,the collected samples may have spatial biases.Therefore,obtaining representative soil sample points and achieving high-precision mapping through representative samples has become one of the bottlenecks in digital soil mapping research.Correcting the spatial biases of samples can effectively solve such problems.Therefore,based on the theory of soil landscape models and the sample correction theory of volunteered geographic information,this paper uses high-dimensional environmental covariate information such as digital elevation models and remote sensing image data to assist in mapping and correcting the spatial representativeness of low-dimensional sample points,in order to obtain higher precision spatial distribution data on soil resources.The main conclusions are as follows:(1)Acquisition of environmental covariate sample space and population space.The soil environmental covariates required for soil organic matter prediction mapping in this study were selected and processed.Seven environmental factors,including normalized difference vegetation index extracted from Sentinel-2 remote sensing images and digital elevation model derived variables such as elevation,slope,aspect,plan curvature,profile curvature,and topographic wetness index,were selected as environmental covariates.After outlier processing,data stretching,and principal component analysis,the top three principal components capturing over 95.55% of the information were extracted as new covariate layers.By dividing the on-site sampling points into two groups A and B as control groups,the eigenvalues of each covariate principal component were extracted at both the on-site sampling points and the sampling points covering the entire gradient of the covariate space,to establish the sample space and population space based on environmental covariates.The purpose of this is to obtain a dataset on the distribution of samples and the overall distribution.(2)Calculation and correction of sample representativeness.For each principal component covariate layer of samples established in groups A and B of the experiment,the bandwidths of their sample spaces were calculated using grid search method,which were2.595,4.3288,4.5349,and 6.2803,6.5793,2.595,respectively.The bandwidths of the total space were calculated using the empirical rule,which were 5.7797,2.7292,and 2.3743.The probability density distribution of each covariate component was estimated using a machine learning density estimator,and the similarity of the sample spaces was calculated based on the trapezoidal integration principle.The total similarity was calculated by weighted normalization.Finally,the similarity values of the sample representativeness for groups A and B were calculated as 0.8138 and 0.8660,respectively.And through three heuristic algorithms were used: genetic algorithm,differential evolution algorithm,and particle swarm optimization algorithm,to iterate a set of weight arrays based on the calculation method of sample representativeness.The total similarity was improved from0.8138 and 0.8660 to 0.8881 and 0.9456,respectively,through the modification of the optimal weights,and the weight distribution and optimization performance of different algorithms were analyzed.The sample weight distributions calculated by different algorithms had some similarity in geographical distribution.Regarding optimization algorithm performance,both genetic algorithm and particle swarm optimization algorithm had a stronger convergence speed than the differential evolution algorithm.Moreover,genetic algorithm performed well in weight iteration mode and similarity improvement speed,and could converge within 25 generations,with a stable similarity improvement strategy.(3)Prediction mapping and accuracy verification.A multiple linear regression model was established using the method of least squares estimation to establish the linear relationship between soil organic matter content and environmental factors.The weighted samples were introduced in the calculation of residual sum of squares for prediction mapping of soil organic matter content for both weighted and unweighted training samples,and the root mean square error and mean absolute error were calculated using 32 independent sampling points for accuracy evaluation and significance analysis of the mapping results.Thus,the effect and reliability of the above method in improving soil mapping accuracy were evaluated.The research results showed that after the optimal weight fitting was performed using the above method,the root mean square error and mean absolute error of mapping for groups A and B decreased by 4.27% and 7.87%,and 10.30% and 12.74%,respectively.Therefore,the application of the sample representativeness modification method based on kernel density estimation can significantly improve the accuracy of soil organic matter mapping.For each group,the relationship between sample representativeness and mapping accuracy was analyzed using linear regression t-test,and the results showed that mapping accuracy was positively correlated with sample representativeness.Moreover,for different algorithms in the same group,the significance relationship was genetic algorithm >differential evolution algorithm > particle swarm optimization algorithm.For different groups of samples,the group with higher initial sample representativeness had a better modification effect.
Keywords/Search Tags:environmental covariates, Spatial deviation, Sample representativeness, Heuristic algorithm, multiple linear regression
PDF Full Text Request
Related items