| Real estate,as a pillar industry in China,has long been the focus of attention from all sectors of society.The housing price index,an important indicator reflecting the changes in real estate prices in key cities in China,is uniformly released by the National Bureau of Statistics in next mid-month of the statistical month,and there is a problem of time-lag in the release time.With the popularity of the Internet,people have become accustomed to using search engines to obtain the required information and have left a large amount of web search data on the Internet.Given the time-lag problem in the release of the housing price index,this paper uses Python software tools to capture real-time search keyword data from the Baidu Index website and,combined with the statistical cycle of the housing price index,constructs a COLGWO-Stacking model for predicting the housing price index.The specific research content is as follows:Firstly,based on Marshall’s equilibrium price theory and Keynes’ s propagation lag theory,the qualitative relationship between web search data and the housing price index is analyzed.On this basis,the core keywords related to the factors affecting housing prices are objectively determined using the Citespace tool,and an initial vocabulary of 119 web search keywords in 8 categories is obtained by extending the keywords using long-tail keywords and demand maps.Secondly,to improve the accuracy,diversity,and generalization ability of the Stacking ensemble strategy,four algorithms,namely XGBoost,Light GBM,SVR,and MLP,are selected as base learners,and MLR algorithm is used as meta-learner.Meanwhile,to avoid prediction errors caused by inappropriate selection of hyperparameters of the base learners,the chaotic search factor and reverse learning strategy are combined to design a chaotic reverse grey wolf optimization algorithm(COLGWO),which is used to optimize the hyperparameters of the base learners and construct the COLGWO-Stacking composite prediction model.Finally,the Spearman correlation analysis method is used in combination with the stepwise regression method to screen out the predictive keyword indicator system for the four first-tier cities of Beijing,Shanghai,Guangzhou,and Shenzhen from the web search keyword library,and the COLGWO-Stacking model is used for prediction.The empirical results show that the Stacking fusion model has better prediction performance than a single machine learning model;using the improved grey wolf algorithm to optimize the hyperparameters of the base learners can further improve the prediction accuracy of the composite model;compared with other ensemble methods,the stability of the COLGWOStacking model is optimal.The research shows that the COLGWO-Stacking composite prediction model constructed in this paper can achieve high-precision prediction of the housing price index,and the predicted index can be about 15 days earlier than the index published by the National Bureau of Statistics.This real-time prediction method can provide important reference for real estate market-related decision-making and investment. |