| With the rapid development of cities,the housing prices of urban commodities continue to increase,causing some people with housing needs to spill over to the housing rental market to alleviate housing problems.However,there are many problems in the current housing rental market,among which the unhealthy development of the housing rental market is a particularly prominent due to the rapid and unreasonable rent growth.Hence,forecasting and studying the rental price of houses is conducive to reasonable pricing,which is helpful for reasonable pricing.This dissertation obtains real housing rental data through crawler technology,applies machine learning algorithm theory to model prediction,predicts housing rents,and constructs good prediction results,providing reference for the rental,selection,and supervision of the housing rental market.The main work done in this dissertation is as follows:(1)Data preparation and preprocessing.This dissertation uses Python web crawler technology to crawl Shenzhen’s online housing rental data from Lianjia.com.Firstly,the missing values and outliers in the data are dealt with.Secondly,to make the data meet the requirements of the model,two data transformations,logarithmic transformation and Box-cox,were used to compare the characteristics of high skews,and it was concluded that the Box-cox transformation was more suitable.Finally,features were screened out by combining the two methods of random Sensitive importance evaluation and Pearson’s correlation coefficient.(2)Predictive model building.Five single regression prediction models including the multiple linear regression,the nearest regression,the random forest regression,the gradient boosting tree regression and the XGBoost regression were constructed,and R~2and NRMSE were used as model evaluation indicators.The XGBoost predicts the most effective,and the worst performance is multiple linear regression.(3)Construct a two-layer Stacking combination model.To further improve the predictive ability of the model,a two-layer Stacking combination model was built.The first layer learners were random forest,nearest neighbor,gradient boosting tree,and XGBoost regression model.The second layers selected multiple linear regression model as the meta-learner.Finally,through comparative experiments,it is found that the Stacking combination model is superior to other models,with an R~2 of 0.930830 and an NRMSE of 0.273301.Therefore,under this data set,the Stacking combination model has the advantages of high prediction accuracy,good adaptability to the data set,and high computational efficiency,which can provide a reference for the landlord pricing and tenant selection and provide a reference for housing rental market supervision. |