| Citrus is an important cash crop,its widespread cultivation and marketing are of great significance to the global agricultural economy.Under the traditional technology,the suitable planting area of citrus varieties is judged by farmers or experts relying on historical experience.However,due to the differences in climate,soil and cultivation techniques,the quality and economic benefits of citrus planted in different areas are also different.Based on this,the establishment of citrus suitable planting sites prediction model can help growers to choose cultivation sites and planting methods,and also provide decision references for agricultural experts;meanwhile,this study has a certain practical significance for improving the quality and planting efficiency of citrus,reducing the waste of resources and promoting the sustainable development of citrus industry.In this paper,based on the citrus quality data,sensory evaluation data and soil data of the key variety "Ehime 28" provided by the College of Landscape Architecture of Huazhong Agricultural University,and combined with python crawler to collect the climate data of the corresponding area,to construct an optimal model for the prediction of citrus planting sites.This work is based on the following three main areas of researching:(1)Classifying citrus quality grade labels.Since the original data is unlabeled and the traditional research is only based on the conventional quality of citrus for ranking.This work tries to add sensory evaluation features to assign good and bad ranking labels to the citrus quality of planting lands by the Entropy weight method,which makes the labeling results more reliable.(2)Optimal classification model selection.At present,there are few studies related to the prediction of suitable planting sites for citrus at home and abroad.In this study,seven classification models,namely XGBoost,Cat Boost,Support vector machine,K-nearest neighbor,Logistic regression,Decision tree and Random forest,are selected for the classification prediction of planting suitability classes based on climate and soil data.The classification effects of these models are compared and analyzed based on five model evaluation metrics: ROC curve(AUC),Precision,Recall,F1-score and Accuracy.The experimental results show that the XGBoost model performs better than other models in all indexes,including an accuracy rate of 0.7333 and an AUC value of 0.7593,but still has some room for optimization.(3)Generative adversarial network(GAN)method for data enhancement.Due to the small amount of data,optimizing various model parameters still cannot improve the model prediction effect,and expanding a large amount of real data in the short time is difficult to achieve.Therefore,in this study,the GAN method is chosen for data enhancement to generate an expanded training set with data similar to the original distribution,which is then combined with classification models for predicting.The results show that,the classification effect of each model is improved,among which the accuracy of GANXGBoost model is improved from 0.7333 to 0.8182 and the AUC value is improved from0.7593 to 0.8864,the improvement ratio is 11.58% and 16.74% respectively,and the model optimization is better.So the model is of a certain practical significance for the prediction of citrus suitable planting sites.In addition,this study outputs the importance ranking of features by XGBoost model to obtain the features that affect the quality of citrus more,and provide agricultural experts and growers with references about the optimization of citrus planting environment. |