Font Size: a A A

Research On Multi-factor Correlation Analysis Of Grain Yield Influencing Factors And Yield Forecast Method Based On Machine Learning

Posted on:2021-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:C L CaoFull Text:PDF
GTID:2518306473464494Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
A series of factors such as environmental degradation,reduction of arable land,population problems,and global warming affect agricultural production and threaten food security.Food security is of utmost importance.Accurate food production forecasts are of great significance for guiding agricultural production,maintaining the continuous growth of food production and food security.However,the current popular grain yield influencing factor analysis and prediction models have many shortcomings:some models only focus on the correlation analysis of grain yield;some models only focus on the prediction of grain yield,ignoring the relationship between grain yield and influencing factors;some The model analyzes the influencing factors of grain yield,but ignores the possible multicollinearity problem in the model;some models have relatively low prediction accuracy.In order to solve these shortcomings,this paper predicts the grain output in Henan Province from 2014 to 2017 based on the combination forecasting model of grey relational analysis-collinearity judgment of impact factors-variable selection-multiple linearity,based on GM(1,1)and BP neural network The combined model predicts the impact factors and output data for a period of time in the future.After in-depth understanding of the change trend of grain production in Henan Province,this article first elaborates on the influence mechanism of various factors on grain production in Henan Province,and finally determines that the main factors affecting grain production in Henan Province are rural population,total power of agricultural machinery,irrigation area,and fertilizer Use amount,rural electricity consumption,pesticide use,film use,diesel use,planting area,etc.and use X1,X2,X3,X4,X5,X6,X7,X8,X9 to represent these 9 impact factors.The specific work done in this paper is as follows.(1)Data collection and missing value filling were carried out.(2)Based on the nine influencing factors of grain production in Henan Province from 1990 to 2017,the related factor analysis of the influencing factors of grain production in Henan Province was carried out based on the GRA gray correlation analysis method.(3)Use the Pearson correlation coefficient and the VIF variance expansion factor to diagnose the collinearity of the correlation factors selected by the gray correlation analysis model.(4)Use Lasso variable selection and ridge regression variable selection to eliminate influencing factors that may produce multicollinearity.(5)Divide the training set and test set,respectively based on the four models of multiple linear regression model,random forest model,gray correlation analysis-Lasso variable selection-multiple linear regression,gray correlation analysis-ridge regression variable selection-multiple linear regression Predict the grain output data of Henan Province from 2014 to 2017.(6)Because the three influencing factors selected and screened by the Lasso variable are rural employment population,irrigated area,and fertilizer usage,the best fit for2014-2017 grain output is used,so the GM(1,1)model and BP neural network combined with Lasso The influencing factors selected by the variable selection are fitted to the grain output of Henan Province from 2018 to 2021.The experimental results show that the prediction accuracy based on gray correlation analysis-multicollinearity judgment-Lasso variable selection-multiple linear regression model,gray correlation analysis-multicollinearity judgment-ridge regression variable selection-multiple linear regression model is higher than the multiple linear regression model The prediction results of the regression model and the random forest model,which are based on gray correlation analysis-multicollinearity judgment-Lasso variable selection-multiple linear regression model has the highest prediction accuracy,gray correlation analysis-multicollinearity judgment-ridge regression variable selection-multiple The linear regression model comes next.Based on the gray correlation analysis-Lasso-multiple linear regression model,the average relative error is about3.3% higher than that of the multiple linear regression model,and about 4.3% higher than the random forest model.The gray relational analysis-ridge regression variable selection-multiple linear regression model increased by 2.3% compared with the multiple linear regression model and about 3.3% compared with the random forest model.Both of these models are effective methods for analyzing and predicting the impact of grain production.The combined model experiment of GM(1,1)model and BP neural network shows that the average relative error of grain production from2018 to 2019 is 2.55,which is an effective method for short-term grain production forecasting.
Keywords/Search Tags:correlation analysis, collinearity judgment, variable selection, grain field forecast
PDF Full Text Request
Related items