Font Size: a A A

Studies On The Prediction And Evaluation Of Poplar’s Waterlogging Tolerance Via Machine Learning Algorithms

Posted on:2024-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:B Z HanFull Text:PDF
GTID:2543307160479594Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Poplar is widely planted in large areas of China,especially in the north,because of its strong waterlogging tolerance,high adaptability and wide variety.It has high ecological value,scientific research value and economic value.Flood disaster has long plagued the development of plant ecological environment and habitat,in particular in the middle and lower reaches of the Yangtze River.Therefore,it is momentous for plant ecological environment protection and economic and social development in China to analyze and screen the important characteristics relate to waterlogging tolerance of poplar and the work of stress resistance selection.In this study,firstly,based on the dynamic changes of seedling height,ground diameter and their biomass of 20 poplars before and after waterlogging experiment,after completing the data preprocessing of missing value,abnormal value and outlier value,the three indexes were tested by normal test(via Q-Q figure)and parameter test(variance analysis and T test)as well as non-parametric test(Mann-Whitney U test),and the waterlogging resistance indexes of poplars with significant differences were screened out.At the same time,the indexes that pass the above test will be weighted by the waterlogging resistance coefficient to construct a new index to measure the waterlogging resistance of poplar,and the cluster analysis tree diagram method will be used to add poplar varieties to the study of the importance of poplar waterlogging resistance.Secondly,26 poplar features after expert screening were further screened by recursive feature elimination method,and then 5 machine learning models(XGBoost,Light GBM,Gradient Boost,Decision Tree and Ada Boost)and fusion models were selected to model and analyze 13 features of 20 poplar varieties.Then,the goodness of fit(R2),root mean square error(RMSE),interpretability variance score(EVS),and mean absolute error(MAE)were used to evaluate and analyze the modeling effects of each model in the training set as well as the test set,respectively.The model Light GBM and the model Gradient Boost performed the best,and the GPI index was used to evaluate each model in the test set.The evaluation results show that the GPI index of Light GBM model and Gradient Boost model is the highest in the single model except the fusion model,and the decision coefficient of the two models in the training set exceeds 0.96.In the test set,the goodness of fit of the Light GBM model with the best fitting effect is 0.8433,the interpretable variance score is0.9237,the average absolute error is 0.1680,and the root mean square error is 0.2346.Finally,the feature importance of Light GBM model and Gradient Boost model is analyzed by Permutation_Importance and Average SHAP Value,and the feature importan-ce is analyzed by combining biometric factors.When the Light GBM model uses two methods for feature ranking,the first 10 features screened by the two methods are exactly the same,so features such as Fm,H2OS and Fo are more important in the model established for waterlogging resistance indicators.In summary,the waterlogging resistance index and the related regression model proposed in this study have higher accuracy and efficiency in prediction.The important characteristics affecting the waterlogging resistance of poplar and their ranking are analyzed.These are of great significance for further predicting the waterlogging resistance of poplar and selecting strong waterlogging resistance varieties.It also provides a certain reference value for the planting of waterlogging-resistant poplar varieties and the realization of carbon neutrality.
Keywords/Search Tags:machine learning method, poplar waterlogging resistance, GPI index, feature importance
PDF Full Text Request
Related items