Font Size: a A A

Analysis And Forecast Of GDP Based On Ensemble Learning Algorithm

Posted on:2023-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:R XuFull Text:PDF
GTID:2530306614487544Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Forecasting economic variables plays an important role in macro-control an government decision-making.The GDP growth rate is one of the important indicators o macro-economy,and the prediction of GDP growth rate has always been an importan research field of macro-economy.However,the severe epidemic situation and complicate economic environment make it more difficult to predict it.Machine learning provides different method from the traditional linear prediction model,which can find the relationshi between input and output variables through autonomous learning and it has stron applicability.A comparative analysis of forecasting the GDP growth rate was made in thi thesis by using the Random Forest,Gradient Boosting Regression Tree and LSTM algorithm The Random Forest has the advantages of fast calculation and good generalizatio performance.GBRT algorithm can build a very reliable prediction model,and it can also dea with the problem of multicollinearity among variables.LSTM has excellent ability to buil time series model,and it has performed well in previous research.The importance of inpu features can be calculated in Random Forest and GBRT algorithm,which shows the role o input features on output features.Therefore,this thesis completed the selection of inpu features by scoring the importance of features,which can increase the interpretability of th model and improve the calculation efficiency.In this thesis,16 characteristics based on the basic characteristics of mainstrean macroeconomic theory and the extended characteristics reflecting Chinese economic situatio were selected as input variables by referring to relevant literatures.The selected features wer the data from the first quarter of 2001 to the fourth quarter of 2020.Modeling was realized b Python.Dynamic prediction window was constructed.Random search and grid search wer used to select super parameters,and the over-fitting problem was solved by 10 K-fold cros validation.The GDP growth rate from the first quarter of 2011 to the fourth quarter of 202 was predicted by one step outside the sample.By using RMSE,MAE,MAPE and othe model evaluation indexes,the prediction effects of different models in two differer characteristic systems were compared.Empirical results show that the prediction can b improved by incorporating extended features into the two ensemble algorithms,and th generalization performance of GBRT is higher than that of Random Forest.However,th performance of LSTM algorithm is the opposite,and the model becomes worse after addin extended features.It is different from the prediction of Xiao(2020).The data in 2020 fluctuated greatly due to the epidemic situation,so I deleted the data of four quarters in 2020 and re-modelled it.It is found that GBRT in extended feature space shows better generalization performance.All indexes of this model are the smallest,and it is much smaller than LSTM model.On this basis,the feature importance of the model was ranked and sorted.It is found that the Industrial Added Value,the Added Value of the Tertiary Industry,the Total Export Value,the Turnover of Shanghai Stock Market and the National Housing Prosperity index play an important role in the forecast.95%importance was taken as the threshold,and then the five features after ranking were deleted.The GBRT and LSTM were re-established by using the feature system with higher importance ranking,and it is found that RMSE of the models decreased significantly.It shows that selecting features can improve the predicted outliers.At the same time,LSTM model is superior to the LSTM model in extended feature system in all indexes after feature selecting.Considering the comprehensive calculation efficiency and prediction accuracy,the empirical results proves that there is a possibility of better performance in establishing a new model through importance scoring and screening features.
Keywords/Search Tags:Machine Learning, GDP growth rate, Random Forest, GBRT, LSTM, Cross Validation
PDF Full Text Request
Related items